

GitHub Actions helps you automate your software development workflows from within GitHub. For more details on connecting your GitHub repository to Azure, see Use GitHub Actions to connect to Azure. When using GitHub Actions, you need to configure the integration between Azure and your GitHub repository.

An existing AKS cluster with an attached Azure Container Registry (ACR).Ĭonfigure integration between Azure and your GitHub repository.An Azure account with an active subscription.You can use multiple Kubernetes actions to deploy to containers from Azure Container Registry to Azure Kubernetes Service with GitHub Actions. metadata objects from being written to the default locations by Hop.GitHub Actions gives you the flexibility to build an automated software development lifecycle workflow. It should be noted that the synced files are located in a read-only directory. The hop configuration within the container also ensures that the necessary hop "default" project points to the synced sources/hop. In the airflow/values.xml git-sync is configured to sync the sources directory from the repo directly into the container. In the repository described here, DAG and hop sources can be found in the sources/dag and sources/hop directories, respectively. DAG scripts and Hop sources are fetched into the container at runtime of the POD via git-sync. Apache Hop in Apache AirflowĪpache Hop is now integrated into the Apache Airflow container and can be used to implement DAGs that execute Hop objects. During the execution of the DAGs you can use ``kubectl get po -n airflow to watch how additional PODs are opened and disappear again. "hop" starts the hop pipeline from sources/hop/generated_rows.hpl``. hello_world runs a simple Python operator that prints a string to the logfile. If everything worked, two DAGs should be visible in Airflow, hello_world and hop. If the port forward is successful, you can access the Airflow installation in the Kubernetes cluster using a web browser via. Kubectl port-forward svc/airflow-webserver 8080:8080 -namespace airflow The Linux user used for this setup should be included in the docker group to ease further steps Installation of git, python : use the OS package manager, e.g.To change DAGs and Hop objects, git and python are required, as well as a local Apache Hop installation. The docker cli needs to be present, as well as the Kubernetes package manager helm. If this is not present, it can be set up quickly with minikube, for example. It is assumed that a Kubernetes cluster is configured and operable from the command line via kubectl. This setup was created on an Ubuntu 20.4 system. For the addition of the Airflow Worker container, a hop package is downloaded from the hop website when building the image. The official Airflow helm chart is used as the basis for this setup. There are no HOP/DAG sources included in the container, which makes the solution re-usable for different projects.
AIRFLOW KUBERNETES GITHUB FULL
With the subprocess approach it is possible to see Apache Hop logs immediatly and in full length.īoth, the Airflow DAG scripts and the HOP objects are fetched by the Airflow worker POD from git via git-sync when a POD is getting started. In the DAGs a PythonOperator in combination with Pythons "subprocess" is used to start Apache Hop processes. The PODs are shut down when the process is finished, so the solution does not use many resources on the cluster. That means a separate POD is started in Kubernetes for each execution of a DAG. This repo contains a sample configuration for using Apache Airflow as the scheduler of Apache Hop in a Kubernetes cluster using KubernetesExecutor. a platform created by the community to programmatically author, schedule and monitor workflows." Apache Hop is eventually a good extension to Airflow if you do not want to implement your ELT/ETL pipelines in a scripting language but instead with a graphical rapid development environment. Many Data Engineers recognize Apache Airflow as an ELT or data integration tool, but on its website it is declared as ". Apache Airflow and Apache Hop on Kubernetes with KubernetesExecutor
