Skip to content

Conversation

@pateash
Copy link
Contributor

closes: #17778


Description

Adding ArangoDB provider based on Python SDK https://github.com/ArangoDB-Community/python-arango

Users can create their own custom operators leveraging the ArangoDBHook directly
or building their operator on AQLOperator by providing result_processor method,

operator = AQLOperator( task_id='aql_operator', sql="FOR doc IN students " \ "RETURN doc", dag=dag, result_processor=lambda cursor: print([document["name"] for document in cursor]) ) 

Sensor can be implemented by SQL

sensor = AQLSensor( task_id="aql_sensor", sql="FOR doc IN students " \ "FILTER doc.name == 'judy' " \ "RETURN doc", timeout=60, poke_interval=10, dag=dag, ) 

@pateashpateash changed the title Add Arango hook WIP: Add Arango hook Mar 27, 2022
Copy link
Contributor

@eladkaleladkal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Over all LGTM

nice job @pateash

:param arangodb_conn_id: Reference to :ref:`ArangoDB connection id <howto/connection:arangodb>`.
"""

template_fields: Sequence[str] = ('sql',)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add also template_ext?

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, make sense
added

:param arangodb_db: Target ArangoDB name.
"""

template_fields: Sequence[str] = ('sql',)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add also template_ext?

Copy link
ContributorAuthor

@pateashpateashMar 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

@potiuk
Copy link
Member

Can you fix static checks please.

@pateash
Copy link
ContributorAuthor

image

@pateash
Copy link
ContributorAuthor

image

@pateashpateash changed the title WIP: Add Arango hook Adding ArangoDB ProviderMar 29, 2022
@pateashpateash requested a review from eladkalMarch 29, 2022 17:30
@pateashpateash closed this Mar 29, 2022
@pateashpateash reopened this Mar 29, 2022
@eladkal
Copy link
Contributor

@potiuk can you take a look at the test failure?
AssertionError: List of expected installed packages and image content mismatch. Check /home/runner/work/airflow/airflow/scripts/ci/installed_providers.txt file.

I don't recall that when adding a new provider we need to edit the CI script

@potiuk
Copy link
Member

potiuk commented Mar 30, 2022

@potiuk can you take a look at the test failure? AssertionError: List of expected installed packages and image content mismatch. Check /home/runner/work/airflow/airflow/scripts/ci/installed_providers.txt file.

I don't recall that when adding a new provider we need to edit the CI script

Not everything in providers has to be me :) - this test was added by @mik-laj actually: 621d17b

It looks like for some reason prodcution image produced in this build contains many more providers than it should

@potiuk
Copy link
Member

Yeah: seems that for some reason it contains all providers:

docker run -it ghcr.io/apache/airflow/main/prod/python3.7:23b7d64b40261dcdcf73187464c6f09b67afcc57 bash Unable to find image 'ghcr.io/apache/airflow/main/prod/python3.7:23b7d64b40261dcdcf73187464c6f09b67afcc57' locally 23b7d64b40261dcdcf73187464c6f09b67afcc57: Pulling from apache/airflow/main/prod/python3.7 c229119241af: Pull complete 5a3ae98ea812: Pull complete d6bab1fc351b: Pull complete f9cea33fb9b5: Pull complete 23c22d6e5b5d: Pull complete b21b38d9bc75: Pull complete e52ad88eda59: Pull complete 5938673019d8: Pull complete 10aec20ab867: Pull complete bfa0b2f2703d: Pull complete abea59e2f689: Pull complete ffd9264d5a4a: Pull complete ea7c97498e3e: Pull complete 4aed0971f3f7: Pull complete 8f85ceb1d546: Pull complete b6132f0f6227: Pull complete 83d18601cc4f: Pull complete 88748a7a2d95: Pull complete 4f4fb700ef54: Pull complete Digest: sha256:bf5da3a686feab47684c036de99a492c3d024fabd4e7a3b69ea9d63ce941b8c8 Status: Downloaded newer image for ghcr.io/apache/airflow/main/prod/python3.7:23b7d64b40261dcdcf73187464c6f09b67afcc57 airflow@54c94bf4e3b9:/opt/airflow$ airflow providers list package_name | description | version ==========================================+=================================================================================================+======== apache-airflow-providers-airbyte | Airbyte https://airbyte.io/ | 2.1.4 apache-airflow-providers-alibaba | Alibaba Cloud integration (including Alibaba Cloud https://www.alibabacloud.com//) | 1.1.1 apache-airflow-providers-amazon | Amazon integration (including Amazon Web Services (AWS) https://aws.amazon.com/) | 3.2.0 apache-airflow-providers-apache-beam | Apache Beam https://beam.apache.org/ | 3.3.0 apache-airflow-providers-apache-cassandra | Apache Cassandra http://cassandra.apache.org/ | 2.1.3 apache-airflow-providers-apache-drill | Apache Drill https://drill.apache.org/ | 1.0.4 apache-airflow-providers-apache-druid | Apache Druid https://druid.apache.org/ | 2.3.3 apache-airflow-providers-apache-hdfs | Hadoop Distributed File System (HDFS) https://hadoop.apache.org/docs/r1.2.1/hdfsdesign.html | 2.2.3 | and WebHDFS https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/WebHDFS.html | apache-airflow-providers-apache-hive | Apache Hive https://hive.apache.org/ | 2.3.2 apache-airflow-providers-apache-kylin | Apache Kylin https://kylin.apache.org/ | 2.0.4 apache-airflow-providers-apache-livy | Apache Livy https://livy.apache.org/ | 2.2.2 apache-airflow-providers-apache-pig | Apache Pig https://pig.apache.org/ | 2.0.4 apache-airflow-providers-apache-pinot | Apache Pinot https://pinot.apache.org/ | 2.0.4 apache-airflow-providers-apache-spark | Apache Spark https://spark.apache.org/ | 2.1.3 apache-airflow-providers-apache-sqoop | Apache Sqoop https://sqoop.apache.org/ | 2.1.3 apache-airflow-providers-arangodb | ArangoDB https://www.arangodb.com/ | 1.0.0 apache-airflow-providers-asana | Asana https://app.asana.com/ | 1.1.3 apache-airflow-providers-celery | Celery http://www.celeryproject.org/ | 2.1.3 apache-airflow-providers-cloudant | IBM Cloudant https://www.ibm.com/cloud/cloudant | 2.0.4 apache-airflow-providers-cncf-kubernetes | Kubernetes https://kubernetes.io/ | 3.1.2 apache-airflow-providers-databricks | Databricks https://databricks.com/ | 2.5.0 apache-airflow-providers-datadog | Datadog https://www.datadoghq.com/ | 2.0.4 apache-airflow-providers-dbt-cloud | dbt Cloud https://www.getdbt.com/product/what-is-dbt/) | 1.0.2 apache-airflow-providers-dingding | Dingding https://oapi.dingtalk.com/ | 2.0.4 apache-airflow-providers-discord | Discord https://discordapp.com/ | 2.0.4 apache-airflow-providers-docker | Docker https://docs.docker.com/install/ | 2.5.2 apache-airflow-providers-elasticsearch | Elasticsearch https://www.elastic.co/elasticsearch | 3.0.2 apache-airflow-providers-exasol | Exasol https://docs.exasol.com/home.htm | 2.1.3 apache-airflow-providers-facebook | Facebook Ads http://business.facebook.com/ | 2.2.3 apache-airflow-providers-ftp | File Transfer Protocol (FTP) https://tools.ietf.org/html/rfc114 | 2.1.2 apache-airflow-providers-github | Github https://www.github.com/ | 1.0.3 apache-airflow-providers-google | Google services including: | 6.7.0 | | | - Google Ads https://ads.google.com/ | | - Google Cloud (GCP) https://cloud.google.com/ | | - Google Firebase https://firebase.google.com/ | | - Google LevelDB https://github.com/google/leveldb/ | | - Google Marketing Platform https://marketingplatform.google.com/ | | - Google Workspace https://workspace.google.pl/ (formerly Google Suite) | apache-airflow-providers-grpc | gRPC https://grpc.io/ | 2.0.4 apache-airflow-providers-hashicorp | Hashicorp including Hashicorp Vault https://www.vaultproject.io/ | 2.1.4 apache-airflow-providers-http | Hypertext Transfer Protocol (HTTP) https://www.w3.org/Protocols/ | 2.1.2 apache-airflow-providers-imap | Internet Message Access Protocol (IMAP) https://tools.ietf.org/html/rfc3501 | 2.2.3 apache-airflow-providers-influxdb | InfluxDB https://www.influxdata.com/ | 1.1.3 apache-airflow-providers-jdbc | Java Database Connectivity (JDBC) https://docs.oracle.com/javase/8/docs/technotes/guides/jdbc/ | 2.1.3 apache-airflow-providers-jenkins | Jenkins https://jenkins.io/ | 2.0.7 apache-airflow-providers-jira | Atlassian Jira https://www.atlassian.com/ | 2.0.4 apache-airflow-providers-microsoft-azure | Microsoft Azure https://azure.microsoft.com/ | 3.7.2 apache-airflow-providers-microsoft-mssql | Microsoft SQL Server (MSSQL) https://www.microsoft.com/en-us/sql-server/sql-server-downloads | 2.1.3 apache-airflow-providers-microsoft-psrp | This package provides remote execution capabilities via the | 1.1.3 | PowerShell Remoting Protocol (PSRP) | | https://docs.microsoft.com/en-us/openspecs/windowsprotocols/ms-psrp/ | apache-airflow-providers-microsoft-winrm | Windows Remote Management (WinRM) https://docs.microsoft.com/en-us/windows/win32/winrm/portal | 2.0.5 apache-airflow-providers-mongo | MongoDB https://www.mongodb.com/what-is-mongodb | 2.3.3 apache-airflow-providers-mysql | MySQL https://www.mysql.com/products/ | 2.2.3 apache-airflow-providers-neo4j | Neo4j https://neo4j.com/ | 2.1.3 apache-airflow-providers-odbc | ODBC https://github.com/mkleehammer/pyodbc/wiki | 2.0.4 apache-airflow-providers-openfaas | OpenFaaS https://www.openfaas.com/ | 2.0.3 apache-airflow-providers-opsgenie | Opsgenie https://www.opsgenie.com/ | 3.0.3 apache-airflow-providers-oracle | Oracle https://www.oracle.com/en/database/ | 2.2.3 apache-airflow-providers-pagerduty | Pagerduty https://www.pagerduty.com/ | 2.1.3 apache-airflow-providers-papermill | Papermill https://github.com/nteract/papermill | 2.2.3 apache-airflow-providers-plexus | Plexus https://plexus.corescientific.com/ | 2.0.4 apache-airflow-providers-postgres | PostgreSQL https://www.postgresql.org/ | 4.1.0 apache-airflow-providers-presto | Presto https://prestodb.github.io/ | 2.1.2 apache-airflow-providers-qubole | Qubole https://www.qubole.com/ | 2.1.3 apache-airflow-providers-redis | Redis https://redis.io/ | 2.0.4 apache-airflow-providers-salesforce | Salesforce https://www.salesforce.com/ | 3.4.3 apache-airflow-providers-samba | Samba https://www.samba.org/ | 3.0.4 apache-airflow-providers-segment | Segment https://segment.com/ | 2.0.4 apache-airflow-providers-sendgrid | Sendgrid https://sendgrid.com/ | 2.0.4 apache-airflow-providers-sftp | SSH File Transfer Protocol (SFTP) https://tools.ietf.org/wg/secsh/draft-ietf-secsh-filexfer/ | 2.5.2 apache-airflow-providers-singularity | Singularity https://sylabs.io/guides/latest/user-guide/ | 2.0.4 apache-airflow-providers-slack | Slack https://slack.com/ | 4.2.3 apache-airflow-providers-snowflake | Snowflake https://www.snowflake.com/ | 2.6.0 apache-airflow-providers-sqlite | SQLite https://www.sqlite.org/ | 2.1.3 apache-airflow-providers-ssh | Secure Shell (SSH) https://tools.ietf.org/html/rfc4251 | 2.4.3 apache-airflow-providers-tableau | Tableau https://www.tableau.com/ | 2.1.7 apache-airflow-providers-telegram | Telegram https://telegram.org/ | 2.0.4 apache-airflow-providers-trino | Trino https://trino.io/ | 2.1.2 apache-airflow-providers-vertica | Vertica https://www.vertica.com/ | 2.1.3 apache-airflow-providers-yandex | Yandex including Yandex.Cloud https://cloud.yandex.com/ | 2.2.3 apache-airflow-providers-zendesk | Zendesk https://www.zendesk.com/ | 3.0.3 

@potiuk
Copy link
Member

This is VERY strange as it seems that when the image was built, it actually used only a small subset (as expected):

#64 1.486 Force re-installing airflow and providers from local files with eager upgrade #64 1.486 #64 2.925 Looking in links: file:///docker-context-files #64 2.937 Processing /docker-context-files/apache_airflow_providers_amazon-3.2.0.dev0-py3-none-any.whl #64 2.952 Processing /docker-context-files/apache_airflow_providers_celery-2.1.3.dev0-py3-none-any.whl #64 2.959 Processing /docker-context-files/apache_airflow_providers_cncf_kubernetes-3.1.2.dev0-py3-none-any.whl #64 2.966 Processing /docker-context-files/apache_airflow_providers_docker-2.5.2.dev0-py3-none-any.whl #64 2.973 Processing /docker-context-files/apache_airflow_providers_elasticsearch-3.0.2.dev0-py3-none-any.whl #64 2.980 Processing /docker-context-files/apache_airflow_providers_ftp-2.1.2.dev0-py3-none-any.whl #64 2.988 Processing /docker-context-files/apache_airflow_providers_google-6.7.0.dev0-py3-none-any.whl #64 2.997 Processing /docker-context-files/apache_airflow_providers_grpc-2.0.4.dev0-py3-none-any.whl #64 3.004 Processing /docker-context-files/apache_airflow_providers_hashicorp-2.1.4.dev0-py3-none-any.whl #64 3.011 Processing /docker-context-files/apache_airflow_providers_http-2.1.2.dev0-py3-none-any.whl #64 3.018 Processing /docker-context-files/apache_airflow_providers_imap-2.2.3.dev0-py3-none-any.whl #64 3.026 Processing /docker-context-files/apache_airflow_providers_microsoft_azure-3.7.2.dev0-py3-none-any.whl #64 3.033 Processing /docker-context-files/apache_airflow_providers_mysql-2.2.3.dev0-py3-none-any.whl #64 3.040 Processing /docker-context-files/apache_airflow_providers_odbc-2.0.4.dev0-py3-none-any.whl #64 3.047 Processing /docker-context-files/apache_airflow_providers_postgres-4.1.0.dev0-py3-none-any.whl #64 3.054 Processing /docker-context-files/apache_airflow_providers_redis-2.0.4.dev0-py3-none-any.whl #64 3.062 Processing /docker-context-files/apache_airflow_providers_sendgrid-2.0.4.dev0-py3-none-any.whl #64 3.069 Processing /docker-context-files/apache_airflow_providers_sftp-2.5.2.dev0-py3-none-any.whl #64 3.076 Processing /docker-context-files/apache_airflow_providers_slack-4.2.3.dev0-py3-none-any.whl #64 3.083 Processing /docker-context-files/apache_airflow_providers_sqlite-2.1.3.dev0-py3-none-any.whl #64 3.090 Processing /docker-context-files/apache_airflow_providers_ssh-2.4.3.dev0-py3-none-any.whl #64 3.223 Processing /docker-context-files/apache_airflow-2.3.0.dev0-py3-none-any.whl 

@potiuk
Copy link
Member

Let me rebase and see it happening again :)

@eladkal
Copy link
Contributor

I don't recall we had such issue when GitHub provider was added (and it was after 621d17b )

@potiuk
Copy link
Member

I don't recall we had such issue when GitHub provider was added (and it was after 621d17b )

Me neither. It basicallly SHOUD NOT happen :D. Yet it seems it did again

@potiuk
Copy link
Member

OK. I know what causes it but I do not know why it happens yet. When PROD build image is prepared we prepare "airflow" package so that it can be installed there from latest sources. But for SOME reason, it contains "all" providers as well. not only airflow. I do not know where it came from yet. But It proves the tests from @mik-laj are useful to catch it.

@potiuk
Copy link
Member

I actually think it could come from the new setuptools release https://pypi.org/project/setuptools/61.2.0/

@potiuk
Copy link
Member

Still puzzled :) but I am getting closer to solve it

@pateash
Copy link
ContributorAuthor

thanks @potiuk.

@potiuk
Copy link
Member

Rebased it @pateash -> I have high hopes for #22649 to either fix it or make it easier to understand where it came from

@potiuk
Copy link
Member

Hi maintainer of python-arango here. I've removed the dependency. Please try again with release version 7.3.2. Thanks.

Cool. Thanks! @pateash -> can you add >=7.3.2 to our requirements please ?

@pateash
Copy link
ContributorAuthor

pateash commented Apr 1, 2022

voila 🥳,
It worked.
Thanks @joowani

Copy link
Member

@potiukpotiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM :) - @eladkal ?

@github-actionsgithub-actionsbot added the full tests needed We need to run full set of tests for this PR to merge label Apr 1, 2022
@github-actions
Copy link

The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest main at your convenience, or amend the last commit of the PR, and push it with --force-with-lease.

@eladkal
Copy link
Contributor

I'll take a look later today

Copy link
Contributor

@eladkaleladkal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

+---------------------+-----------------------------------------------------+-------------------------------------------+
| trino | ``pip install 'apache-airflow[trino]'`` | All Trino related operators & hooks |
+---------------------+-----------------------------------------------------+-------------------------------------------+
| arangodb | ``pip install 'apache-airflow[arangodb]'`` | ArangoDB operators, sensors and hook |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this list is sorted alphabetically?

@eladkaleladkal merged commit c758c76 into apache:mainApr 3, 2022
@potiuk
Copy link
Member

🎉 🎉 🎉 🎉 🎉 🎉 🎉

@ephraimbuddyephraimbuddy added the changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..) label Apr 11, 2022
@pateashpateash deleted the airflow-17778 branch May 19, 2022 14:30
Sign up for freeto join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:dev-toolsarea:providerschangelog:skipChanges that should be skipped from the changelog (CI, tests, etc..)full tests neededWe need to run full set of tests for this PR to mergekind:documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add Arango hook

5 participants

@pateash@potiuk@eladkal@joowani@ephraimbuddy