Azure Synapse
Getting Started
Start and Stop Dedicated SQL Pool
-
Click the Integrate tab.
-
Under Pipelines, click either Start Dedicated SQL Pool or Pause Dedicated SQL Pool. Then click the trigger button to open a menu, and select Trigger now. On the next screen, click OK.
Home
The Home tab is where you start when you first open Azure Synapse Studio.
From here, you can access shortcuts for common tasks such as creating SQL scripts or notebooks by clicking the New dropdown menu button. Recently opened resources are also displayed.
Data
The Data tab is where you can explore everything in your database and linked datasets.
Under the Workspace tab, you can explore the dedicated SQL pool database and any Spark databases.
Under the Linked tab, you can explore external objects (e.g. Data Lake accounts) and explore and create any integration datasets from external linked data (e.g. Data Lake, Blob Storage, web service, etc) to be used in pipelines.
How to Bring in Data from Linked Services
Note: This example shows how to get data from Data Lake, although there are many source types available.
-
Click the « + » button the add a new resource, then click Integration Dataset.
-
Select Azure Data Lake Storage Gen2 (you may need to search for this), then click Continue.
-
Select the format type, then click Continue.
-
Enter a name, then click the drop-down menu under Linked service and select your data lake.
-
Under Connect via integration runtime, ensure that interactive authoring is enabled. If it is not, click the edit button to enable it, then click Apply.
-
Set additional properties as appropriate, then click OK.
How to Explore Data in the Data Lake
Browse to find your dataset file (CSV, Parquet, JSON, Avro, etc) and right click it. A menu will open with options to preview the data, or create resources such as SQL scripts and notebooks.
How to Explore the Dedicated SQL Pool
Under the Workspace tab, you can explore databases similarly to SQL Server Management Studio. Right click any table, highlight New SQL script, and click Select TOP 100 rows to create a new query. You can then view the results as either a table or a chart.
Importing Data to the Dedicated SQL Pool
To import data to the dedicated SQL pool, you can either: - create a pipeline with a Copy data activity (most efficient for large datasets) - use the Bulk Load Wizard.
Develop
From here, you can create and save resources such as SQL scripts, notebooks, and Power BI reports.
To add a new resource, click the « + » button. A dropdown menu will open.
To make your changes visible to others, you need to click the Publish button.
SQL Scripts
Be sure to connect to your dedicated SQL pool to run SQL scripts.
Notebooks
To run notebook cells, you first need to select your Apache Spark pool.
To change languages for a single cell, you can use the following magic commands: %%pyspark, %%spark, %%csharp, %%sql. You can also change the default language using the Language dropdown menu.
Dataflows
To add a source to a dataflow, click the « + » button under Source Settings, then select Azure Data Lake Storage Gen2 (you may need to search for this). Click Continue, select the data format, then on the next page, select your Linked Service.
Power BI Reports
You can view and create Power BI reports directly in Azure Synapse. Please contact the Collabotative Analytics Environment support team to validate that a linked service is set up.
Integrate
This is where you can create pipelines for ingesting, preparing and transforming all of your data, like in Azure Data Factory.
Example: Copy Data from External Blob to Data Lake
- Click the « + » button to add a new resource, then click Pipeline.
- Under Move & transform, drag and drop Copy data into the window.
- Click on the Source tab, then click New to add the source dataset (where you want to copy the data from).
-
Select Azure Blob Storage, then select the format type (CSV, Parquet, JSON, etc). Set any additional properties if relevant, then click OK.
-
Click Sink, then click New to set the sink dataset (where you want the data to be copied to). Choose Azure Data Lake Storage Gen2, then select the format type. Under Linked service, choose your data lake and ensure that interactive authoring is enabled (see How to Bring in Data from Linked Services under Data for more information).
Debugging and Running Pipelines
To run a pipeline in debug mode, click the Debug button at the top of the pipeline window. Results will appear in the Output tab.
To run a pipeline without debugging, click the Add trigger button, then Trigger now.
When you are ready to publish your pipelines, click the Validate all button, then click the Publish all button. Note that this will publish for all users to see everything that you currently have open (pipelines, SQL scripts, notebooks, etc).
Monitor
From the Monitor tab, you can monitor live pipeline runs (the inputs and outputs of each activity and any errors) and view historical pipeline runs, trigger runs, SQL requests, etc.
Manage
This is where you can: - Add new SQL or Apache Spark pools - Add new linked services - Grant others access to the workspace - Set up git integration