Skip to content

The Advanced Analytics Workspace

Statistics

Open source and made for you!

The AAW is an open-source platform specifically crafted for data scientists, analysts, and researchers proficient in open-source tools and coding.

The Advanced Analytics Workspace (AAW) stands as a comprehensive and open-source solution designed to cater to the diverse needs of data scientists. It offers a flexible environment that empowers practitioners to seamlessly conduct their work. More information about the AAW and Data Analytics Services (DAS) can be found on the DAS Portal.

Warning

Many of the links on https://www.statcan.gc.ca/data-analytics-services/aaw are broken.

Getting Started

  • StatCan Users: Access the Kubeflow Dashboard to get started.
  • External Users and Collaborators: Fill out the DAS Onboarding Form to tell us about your project needs. Once completed, a DAS representative will contact you to discuss the next steps and begin the onboarding process. Note: External users need a StatCan Cloud account granted by the business sponsor.

Creating Kubeflow Notebook Servers

Follow these steps to create your first notebook server:

  1. Log in to Kubeflow;
  2. Click Notebooks from the sidebar on the left (you may need to select a namespace from the Select namespace dropdown menu in the upper left-hand corner);
  3. Click the + New Notebook button (upper right-hand corder);
  4. Follow the instructions here to configure the notebook server.

Need help creating a notebook server?

We have a Slideshow with instructions on how to create a notebook server.

Kubeflow Documentation

The AAW is based on Kubeflow, an open source comprehensive solution for deploying and managing end-to-end ML workflows. Kubeflow simplifies the creation and management of customizable compute environments with user-controlled resource provisioning (custom CPU, GPU, RAM and storage). For more information on Kubeflow, please visit:

Kubeflow Videos

Videos on Kubeflow have been developed by Google:

Working with Your Data

Once your notebook server has been created, you may want to import data or access shared data from cloud storage. Instructions on how to add storage to your notebook server can be found on the documentation page for storage.

Protected Data

If your project requires protected data:

  • Cloud storage buckets will be created for you at the time of your projects onboarding.
  • Accessing protected data is done by opening the buckets folder, see the documentation on Azure Blob Storage.

Unprotected Data

If you want to upload data into your notebook server (on a Data Volume, for instance), you can upload data into JupyterLab by following the official JupyterLab documentation, which has a section on uploading and downloading files from the JupyterLab web interface.

Working in JupyterLab

Kubeflow creates and manages notebook servers running JupyterLab, which is the main interface in which you'll be doing your data science work.

Virtual Environments

When conducting data science experiments, it's a best practice to utilize Python and/or conda virtual environments to manage your project dependencies. It is common to create a dedicated environment for each project or, in some cases, separate environments for different features or aspects of your work (for instance, one environment for general projects and an additional environment tailored for GPU-accelerated deep learning tasks).

Virtual Environments and the Launcher

If you find yourself frequently switching between environments and desire a more convenient way to access them within JupyterLab, you can follow these instructions.

JupyterLab Documentation

Example IPython Notebooks

You can download these notebooks and upload them to your notebook server. These notebooks can also be run from Visual Studio Code if you prefer.

  1. Visual Python: Simplifying Data Analysis for Python Learners
  2. YData Profiling: Streamlining Data Analysis
  3. Draw Data: Creating Synthetic Datasets with Ease
  4. D-Tale: A Seamless Data Exploration Tool for Python
  5. Mito Sheet: Excel-Like Spreadsheets in JupyterLab
  6. PyGWalker: Simplifying Exploratory Data Analysis with Python
  7. ReRun: Fast and Powerful Multimodal Data Visualization
  8. SweetViz: Streamlining EDA with Elegant Visualizations

Need Help?

Join our vibrant community! Connect with AAW developers and fellow users, ask questions, and share experiences all on the Slack Support Channel.

For comprehensive documentation and guidance, refer to the:

Do you need help?

Need real-time assistance? Join our Slack Support Channel.

Demos and Contributions

For in-depth demos, personalized assistance, or to contribute to the AAW community, reach out to us on Slack Support Channel. You can contribute to the platform's development and report issues or feature requests on GitHub.

External Learning Resources

Some of the AAW Developers are also data scientists! So we have a lot of material to share on data science tooling and best practices. Below are some useful and interested data science learning resources:

Data Science Resources (R and Python)

Python Language Resources

R Language Resources