Skip to content

Azure Data Factory

Access Data Factory

Dashboard

See the Dashboard section of this documentation from more information.

  1. Click on the Dashboard menu from the Azure Portal.

    Dashboard

ADF URL

  1. Navigate to https://adf.azure.com, and select the Data Factory instance that was created for you.

    Data Factory URL

Azure Portal

  1. In the Azure Portal Search box, search for Data factories.

    Azure Portal Search

  2. You should then see a list of the Data Factories you were given permission to access.

    DataFactory List

Authoring

Click on Author and Monitor.

Author Monitor

In Data Factory, you have the ability to author and deploy resources.

Author

See Visual authoring in Azure Data Factory for more information.

You can also use some of the various wizards provided on Data Factory Overview page.

NOTE: Configuring SSIS Integration is NOT recommended. Contact the support team through the Slack channel if you have questions.

Data Factory Wizards

See Azure Documentation Tutorials for more details.

Access the Data Lake from ADF

A Data Lake connection has been pre-configured for your environment.

  1. Click on Manage.

  2. Click on Linked Services.

  3. The linked service with the Azure Data Lake Storage Gen2 type is your Data Lake.

    Data Lake

Note: You have been granted access to specific containers created in the Data Lake for your environment.

Access Azure SQL Database

Some projects have an Azure SQL Database instance.

  1. Click on Manage.

  2. Click on Linked Services.

  3. The linked service(s) with the Azure SQL Database type is / are your Database(s)

    SQL Database

Save / Publish Your Data Factory Resources

Azure Data Factory can be configured to save your work to the following locations:
- Git repository - Publish directly to Data Factory

Git (when supported)

When Git is enabled you can see your configuration and save your work to a specific branch.

  1. Click on Manage

  2. Click on Git Configuration.

  3. See the Git configuration that was setup for you:

    Git Config

  4. When authoring a workflow it can be saved to your branch. Click on + New branch from this branch dropdown to create a new feature branch.

    Saving to your branch

  5. When you are ready to merge the changes from your feature branch to your collaboration branch (master), click on the branch dropdown and select Create pull request. This action takes you to Azure DevOps Git Repo where you can create pull requests, do code reviews, and merge changes to your collaboration branch (master) after the pull request has been approved.

  6. After you have merged changes to the collaboration branch (master), click on Publish to publish your code changes from the master branch to Azure Data Factory. Contact the support team through the Slack channel if you receive an error when trying to Publish.

Data Factory Service

When Data Factory is not integrated with source control your workflows are stored directly in the Data Factory service and you cannot save partial changes, you can only Publish all which overwrites the current state of the Data Factory with your changes, which are then visible to everyone.

Data Factory Workspace

Ingest and Transform Data with ADF

Integration Runtimes

AutoResolveIntegrationRuntime

Do not use. Please use the canadaCentralIR-4nodesDataFlow or selfHostedCovidIaaSVnet runtimes instead.

The auto resolve runtime is created by default with the data factory instance, and will auto resolve to the Azure Data Centre closest to the data, which may violate data residency policies.

canadaCentralIR-4nodesDataFlow

This is shared by all users and runs all the time.

Can Access:
  • Internal Data Lake
  • External Storage Account
  • External Data Sources (Internet)
Cannot Access:
  • Azure SQL Database

selfHostedCovidIaaSVnet

Located inside CAE virtual network (VNet).

Can Access:
  • Internal Data Lake
  • SQL Server
Cannot Access:
  • External Storage Account
  • External Data Sources (Internet)

Example: How to connect John Hopkins Data

  1. There is an example workflow that shows how to ingest data from GitHub using a Data Factory Pipeline.

    John Hopkins Pipeline

  2. Data can be filtered from within Data Factory.

    Transform Ingested Data

  3. Alternatively, data can be pulled from GitHub using code in a Databricks notebook.

    git pull

Microsoft Documentation

YouTube Videos