Skip to content

Jupyter Notebooks

Jupyter Notebooks provide a powerful and interactive environment for geospatial analysis. With Jupyter Notebooks, users can combine code, visualizations, and explanatory text in a single document. This allows for a seamless workflow where users can explore, analyze, and visualize geospatial data using Python or other programming languages. Jupyter Notebooks are particularly useful for iterative analysis, as they enable users to run and modify code cells in a flexible and interactive manner.

PyCharm

PyCharm is an integrated development environment (IDE) specifically designed for Python development. Geospatial analysts can utilize PyCharm to efficiently write, debug, and test their geospatial analysis scripts and applications. PyCharm provides a feature-rich environment with advanced code editing capabilities, such as code completion, syntax highlighting, and code navigation, which enhance productivity. PyCharm's debugging and profiling tools assist in identifying and resolving issues, ensuring the reliability of geospatial analysis workflows.

Visual Studio Code (VSCode)

Visual Studio Code, often referred to as VSCode, is a lightweight and versatile code editor that supports various programming languages, including Python. VSCode also offers a flexible and customizable user interface, allowing analysts to arrange their workspace according to their preferences. Its integrated terminal and debugging capabilities make it convenient for executing and troubleshooting geospatial analysis scripts.

Python

Geospatial analysis mainly uses Python for several reasons:

  1. Rich ecosystem of geospatial libraries: Python has a vast collection of specialized geospatial libraries such as GeoPandas, Shapely, Fiona, and PySAL, which provide powerful tools for handling geospatial data, performing spatial operations, and conducting advanced geospatial analysis.

  2. Integration with other data science and analysis libraries: Python's popularity in the data science community and its extensive ecosystem of data analysis libraries such as NumPy, Pandas, and Matplotlib make it an ideal choice for geospatial analysis. Python allows seamless integration of geospatial analysis with other data processing and visualization tasks.

  3. Flexibility and versatility: Python is a versatile programming language known for its flexibility. It allows users to combine geospatial analysis with other functionalities, such as machine learning, statistical analysis, and web development. Python's flexibility enables the creation of custom workflows and tailored solutions for specific geospatial analysis needs.

  4. Ease of use and readability: Python is renowned for its readability and user-friendly syntax. Its clear and concise code structure makes it easier for both beginners and experienced programmers to understand and write geospatial analysis scripts and workflows. Python's readability contributes to better collaboration and maintainability of geospatial projects.

  5. Active community support: Python benefits from a large and active community of geospatial analysts, developers, and researchers who contribute to the development and improvement of geospatial libraries and tools. The availability of extensive documentation, tutorials, and online resources makes it easier for users to learn, troubleshoot, and get assistance when working on geospatial analysis projects.

While Python is widely used in geospatial analysis, it is important to note that other programming languages like R, Java, and C++ also have their own geospatial libraries and ecosystems. The choice of programming language ultimately depends on specific project requirements, personal preferences, and existing expertise.

Virtual Environments

Best Practice: Working with Conda Virtual Environments

Using virtual environments with Conda can help you create consistent, reproducible, and isolated environments for your projects, which can save time and prevent issues caused by conflicting dependencies or system-level changes.

For example, ArcGIS products function best with python 3.7. By creating a separate ArcGIS environment you can install 3.7 without causing conflicts with the modern python 3.10 release.

Create a new environment

conda create --name <env_name> <package>
example:
conda create --n geoENV python=3.7

Activate your environment
conda activate <env_name>

Installing package(s)

Common Geo-Specific Packages available

This list is not exhaustive, but here are some of the common packages:

- GeoPandas
- Shapely
- Fiona
- GDAL/OGR
- PyProj
- Cartopy
- Rasterio
- Geoplot
- Basemap
- Bokeh
- PySAL
- Spatial Pandas
- NetworkX
- PyShp
- TileStache
- GdalUtils
- Scipy
- PyTopo
- Geopy
- Plotly
Conda

DAaaS utilizes Artifactory for package and library management:

https://jfrog.aaw.cloud.statcan.ca/artifactory/conda-forge-remote

To use:

Miniforge (conda) has been preconfigured to use the DAS Artifactiry.
You should not need to specify the channel. If this fails, we have included examples on direct connections after the simple examples:

conda install [package]

For specific versions

conda install geopandas
conda install matplotlib=3.7.0

Connecting directly to the artifactory channel:

conda install -c https://jfrog.aaw.cloud.statcan.ca/artifactory/conda-forge-remote/ [package]
conda install -c https://jfrog.aaw.cloud.statcan.ca/artifactory/conda-forge-remote/ [package=X.X...]

Confirm your package installation

conda list

Conda Cheat Sheet

conda cheat

Link to full Conda cheat sheet

PIP

PIP has also been preconfigured to use the DAS artifactory custom index:

pip install [package] 
pip list

If that fails and you need to specify the index url:

pip install --index-url https://jfrog.aaw.cloud.statcan.ca/artifactory/api/pypi/pypi-remote/simple <package-name>

Some Basic Examples

Connect to GAE ArcGIS Portal (Enterprise)

Your project group will be provided with a Client ID upon onboarding which will be used to connect to the ArcGIS Enterprise Portal. Paste the Client ID in-between the quotations

from arcgis.gis import GIS
gis = GIS("https://geoanalyticsdev.cloud.statcan.ca/portal", client_id=' ')
print("Successfully logged in as: " + gis.properties.user.username)
This will trigger a pop-up window to authenticate, then provide you with a key to enter into the IDE

Convert a WFS into pandas DataFrame
import geopandas as gpd

# Set WFS URL and layer name
wfs_url = 'https://mywfs.com/wfs'
layer_name = 'my_layer'

# Read WFS into a geopandas dataframe
gdf = gpd.read_file(wfs_url, layer=layer_name)

# Convert geopandas dataframe to pandas dataframe
df = gdf.drop(columns='geometry')

# Preview the dataframe
print(df.head())
Shapefile to GeoDataFrame (Spatial DataFrame)
import geopandas as gpd

# Define the path to the shapefile
shapefile_path = 'path/to/your/shapefile.shp'

# Use geopandas to read the shapefile into a GeoDataFrame
gdf = gpd.read_file(shapefile_path)

# Print the GeoDataFrame
print(gdf)
Export a GeoDataFrame to ArcGIS Enterprise
from arcgis.gis import GIS
import geopandas as gpd

# Define the URL of your ArcGIS Enterprise portal
portal_url = 'https://geoanalytics.cloud.statcan.ca/portal/'

# Create a connection to your portal
gis = GIS(portal_url, client_id='')

# Define the name of the feature layer to be created
layer_name = 'your_layer_name'

# Publish the GeoDataFrame to your portal as a feature layer
feature_layer = gis.content.import_data(gdf, title=layer_name)

# Print the URL of the feature layer
print(feature_layer.url)
Join CSV to SHP(as sdf) then Export to ArcGIS Enterprise

import pandas as pd
from arcgis.gis import GIS
from arcgis.features import SpatialDataFrame

# Load the CSV file into a Pandas dataframe
csv_df = pd.read_csv('path/to/csv_file.csv')

# Load the spatial data into a SpatialDataFrame using ArcGIS API for Python
sdf = SpatialDataFrame.from_featureclass('path/to/spatial_data.shp')

# Join the CSV dataframe to the spatial dataframe based on a common field
joined_sdf = sdf.merge(csv_df, on='common_field')

# Export the joined spatial dataframe to ArcGIS Enterprise using the ArcGIS API for Python
gis = GIS('https://geoanalytics.cloud.statcan.ca/portal/', client_id='')
joined_fc = joined_sdf.spatial.to_featureclass(location='path/to/output.gdb', overwrite=True)
joined_item = gis.content.add({'type': 'Feature Service', 'title': 'Joined Data', 'tags': 'Data'}, data=joined_fc)
joined_item.publish()
This code first loads a CSV file into a Pandas dataframe using the pd.read_csv() function. It then loads a spatial dataset into a SpatialDataFrame using the SpatialDataFrame.from_featureclass() method of the ArcGIS API for Python. The two dataframes are then joined based on a common field using the merge() method of the SpatialDataFrame.

Finally, the joined SpatialDataFrame is exported to a feature class using the spatial.to_featureclass() method of the SpatialDataFrame, and then published to ArcGIS Enterprise using the gis.content.add() and publish() methods of the ArcGIS API for Python. Note that you will need to replace the example paths and server URL with the actual paths and URL for your data and ArcGIS Enterprise instance.

GeoCode a dataframe using OSM API

import requests
import pandas as pd

def geocode_address(address):
    """
    Geocode a single address using the OpenStreetMap API
    """
    url = "https://nominatim.openstreetmap.org/search"
    params = {
        "q": address,
        "format": "json"
    }
    response = requests.get(url, params=params)
    if response.ok:
        results = response.json()
        if len(results) > 0:
            return results[0]
    return None

def geocode_dataframe(df, address_column):
    """
    Geocode a Pandas dataframe using the OpenStreetMap API
    """
    # Create a new dataframe to store the geocoding results
    geocoded_df = pd.DataFrame(columns=["latitude", "longitude"])

    # Loop through each row in the original dataframe
    for index, row in df.iterrows():
        # Get the address from the specified column
        address = row[address_column]
        # Geocode the address using the OpenStreetMap API
        result = geocode_address(address)
        if result:
            # Add the latitude and longitude to the new dataframe
            geocoded_df.loc[index] = [result["lat"], result["lon"]]
        else:
            # If geocoding failed, add NaN values to the new dataframe
            geocoded_df.loc[index] = [float("NaN"), float("NaN")]

    # Add the new columns to the original dataframe
    df["latitude"] = geocoded_df["latitude"]
    df["longitude"] = geocoded_df["longitude"]

    return df
To use this code, simply call the geocode_dataframe function with your Pandas dataframe and the name of the column that contains the address data. This will add two new columns to the dataframe, "latitude" and "longitude", which contain the geocoded coordinates for each address. Note that this code uses the requests library to make HTTP requests to the OpenStreetMap API, so you'll need to make sure that it's installed in your environment before running the code. At this time, the OSM API is blocked by the firewall

Raster Analysis with GDAL

from osgeo import gdal
import numpy as np

# Open the raster file
raster_ds = gdal.Open('path/to/raster.tif')

# Read the raster band into a NumPy array
raster_band = raster_ds.GetRasterBand(1)
raster_array = raster_band.ReadAsArray()

# Perform some analysis on the raster data
# For example, calculate the mean pixel value
mean_value = np.mean(raster_array)

# Print the result
print('Mean pixel value: {}'.format(mean_value))
This code opens a raster file using the gdal.Open() method and reads the first band of the raster into a NumPy array using the ReadAsArray() method of the gdal.Band object. It then performs some analysis on the raster data, in this case calculating the mean pixel value using the np.mean() function from NumPy. Finally, it prints the result to the console.

You can modify this code to perform other types of analysis on the raster data, such as calculating the minimum, maximum, or standard deviation of the pixel values, or performing calculations between multiple bands. GDAL provides a wide range of functions and tools for working with raster data, so the possibilities are nearly endless.

Learn More

Learn more about about Artifactory

Learn more about Conda

Connecting to Spatial Data - GAE Enterprise Portal

The ArcGIS Enterprise Portal can be accessed in either the AAW or CAE using the API, from any service which leverages the Python programming language.

For example, in AAW and the use of Jupyter Notebooks within the space, or in CAE the use of Databricks, DataFactory, etc.

Connecting to GAE Portal using ArcGIS API
  1. Install packages:

    conda install -c esri arcgis
    

    or using Artifactory

    conda install -c https://jfrog.aaw.cloud.statcan.ca/artifactory/api/conda/esri-remote arcgis
    
  2. Import the necessary libraries that you will need in the Notebook.

    from arcgis.gis import GIS
    from arcgis.gis import Item
    

  3. Access the Portal Your project group will be provided with a Client ID upon onboarding. Paste the Client ID inbetween the quoatations client_id='######'.

    gis = GIS("https://geoanalytics.cloud.statcan.ca/portal", client_id=' ')
    print("Successfully logged in as: " + gis.properties.user.username)
    
    • The output will redirect you to a login Portal.
    • Use the StatCan Azure Login option, and your Cloud ID
    • After successful login, you will receive a code to sign in using SAML.
    • Paste this code into the output.

    OAuth2 Approval

Search for your Content

search() method The search() method is used to retrieve a collection of items that match specific search criteria. It allows you to search for items based on various parameters such as keywords, item types, owners, tags, groups, and more. The search() method returns a list of items that match the specified search criteria. This method is useful when you want to retrieve multiple items that meet certain conditions.

There are multiple ways to search for content depending on the amount of metadata you have filled out for your item. Learn more about .search method here

Search all of your items in the Portal
# Get the currently logged-in user
me = gis.users.me

# Retrieve all the items owned by the user
my_content = me.items()

# Print the collection of user's items
my_content
Search content by name
# Search for items by title
search_results = gis.content.search(query="Your Title")

# Iterate over the search results
for item in search_results:
print(f"Title: {item.title}, ID: {item.id}")
Search content by tag
# Search for items by tags
search_results = gis.content.search(query="", tags="your_tag")

# Iterate over the search results
for item in search_results:
    print(f"Title: {item.title}, ID: {item.id}")
Search content by group
# Specify the group ID or group name you want to search within
group_id = "your_group_id_or_name"

# Search for items within the specified group
search_results = gis.content.search(query="", inGroup=group_id)

# Iterate over the search results
for item in search_results:
    print(f"Title: {item.title}, ID: {item.id}")
Get Content (fetch/retrieve)

get() method

The get() method is used to retrieve a specific item by its unique item ID. You provide the item ID as an argument to the get() method, and it returns the item with that particular ID. This method is useful when you already know the exact item ID and want to retrieve that specific item.

The efficent way to retrieve content is by using the items ID:

# Retrieve a specific item by its ID
item_id = "your_item_id"
item = gis.content.get(item_id)

It is also possible to .get the content from a list created during searching (like above), however, this can become convoluted when conducting multiple searches, just be careful with your syntax.

#from list of search results 
item1 = gis.content.get(my_content[5].id) #[5] = index number of search 
display(item1)

Working with Spatial Data

The choice between ArcGIS and open-source tools for spatial data depends on your specific needs and available resources. ArcGIS offers specialized functionality and support, while open-source tools are free and customizable. Consider your requirements and available expertise to make an informed decision. In some cases, a hybrid approach may be suitable, where you can leverage the strengths of both ArcGIS and open-source tools depending on the task at hand.

Convert Feature Service to Spatially Enabled DataFrame (open source)

Conversion of an ArcGIS feature layer into a Pandas DataFrame with spatial capabilities using the pd.DataFrame.spatial.from_layer() method.

import pandas

# Get the feature service item
item = gis.content.get(item_id)

# Access the feature layer within the feature service
feature_layer = item.layers[0]

#Convert
sdf = pd.DataFrame.spatial.from_layer(feature_layer)

Convert Feature Service to GeoDataFrame (open source)

Conversion of an ArcGIS feature layer into a Pandas DataFrame with spatial capabilities using the pd.DataFrame.spatial.from_layer() method.

import geopandas as gpd

# Assuming you have the item ID of the feature service
item_id = "your_item_id"

# Get the feature service item
item = gis.content.get(item_id)

# Access the feature layer within the feature service
feature_layer = item.layers[0]

# Query the feature layer to retrieve all features
features = feature_layer.query().features

# Convert the features to a GeoDataFrame
gdf = gpd.GeoDataFrame.from_features(features)

Publish a Spatial DataFrame as a Feature Service to GAE Portal
#sdf = your spatial dataframe

item_properties = {'title': '<title name>', 'tags': '<tag>', 'description': '<this is my item description>'}
published_item = gis.content.import_data(sdf, item_properties=item_properties)
published_item.publish()

# Retrieve the item ID and URL of the published feature service
item_id = published_item.id
feature_service_url = published_item.url

# Print the item ID and URL
print("Item ID:", item_id)
print("Feature Service URL:", feature_service_url)

Visualize Your Data on an Interactive Map

To visualize the map widget within different python based tools, you may need to leverage tool-specific display functions and or widgets. For example, in Databricks use the %python magic command to switch to Python mode before creating and displaying the map widget.

ArcGIS Map Module
from IPython.display import display

# Retrieve the feature service item
item = gis.content.get("feature_service_item_id")

# Create a map widget
map_widget = MapView()

# Add the feature service layer to the map
map_widget.add_layer(item.layers[0])

# Display the map widget using Databricks-specific display function
display(map_widget)
MatplotLib Library
import matplotlib.pyplot as plt
# Convert the spatial dataframe to a GeoDataFrame (if needed)
gdf = gpd.GeoDataFrame(sdf)

# Create a figure and axis
fig, ax = plt.subplots()

# Plot the GeoDataFrame
gdf.plot(ax=ax)

# Display the plot
plt.show()
ipyleaflet Library
from ipyleaflet import Map, GeoData

# Assuming you have a GeoDataFrame called 'gdf'

# Create a map
m = Map(center=(gdf.geometry.centroid.y.mean(), gdf.geometry.centroid.x.mean()), zoom=10)

# Create a GeoData layer from the GeoDataFrame
geo_data = GeoData(geo_dataframe=gdf)

# Add the GeoData layer to the map
m.add_layer(geo_data)

# Display the map
m

Creating Visuals in Notebooks (python)

When creating map visuals in a notebook, you can choose between using the proprietary ArcGIS API for Python, open-source Python libraries like GeoPandas and Matplotlib, or adopting a hybrid approach that combines both. The ArcGIS API offers extensive geospatial capabilities and integration with ArcGIS products, while open-source libraries provide flexibility and community support. The hybrid approach allows you to leverage the strengths of both options based on your specific needs and preferences.

Using Open Source Methods
  1. Install the required Python packages for geospatial analysis, such as geopandas, folium, or matplotlib.
  2. Import the necessary modules in your Python notebook.
  3. Read or import the geospatial data into a suitable data structure.
  4. Visualize the data using the chosen package's mapping functions or classes.
  5. Customize the map properties, such as colors, symbology, or basemaps.
  6. Add additional layers or annotations to the map, if needed.
  7. Export the map to a desired format or display it in the notebook.
  8. Save the map or share it with others as an image or interactive HTML file.

Example code using geopandas and matplotlib:

import geopandas as gpd
import matplotlib.pyplot as plt

# Read the geospatial data
data = gpd.read_file("path_to_shapefile.shp")

# Visualize the data on a map
data.plot()

# Customize the map properties
plt.title("My Geospatial Map")
plt.xlabel("Longitude")
plt.ylabel("Latitude")

# Export the map
plt.savefig("path_to_output_file.png", dpi=300)

# Display the map in the notebook
plt.show()
Using ArcGIS API Method

Using ArcGIS Paid Product

  1. Import the necessary ArcGIS modules in your Python notebook.
  2. Connect to your ArcGIS account or portal using appropriate credentials.
  3. Create a map object using the arcgis.mapping module.
  4. Add desired layers or data to the map.
  5. Customize the map properties, such as extent, scale, and symbology.
  6. Optionally, add labels, legends, or other cartographic elements.
  7. Export the map to a desired format or display it in the notebook.
  8. Save the map or share it with others through the ArcGIS platform.

Example code:

import arcpy
from arcgis.gis import GIS
import arcgis.mapping as mapping

# Connect to your ArcGIS account or portal
gis = GIS("https://geoanalyticsdev.cloud.statcan.ca/portal", client_id=' ')
# Authentication pop-up will open

# Create a new map
map = mapping.Map()

# Add layers or data to the map
map.add_layer("path_to_layer1")
map.add_layer("path_to_layer2")

# Customize map properties
map.zoom_to_layer("path_to_layer1")
map.legend = True

# Export the map
map.export("path_to_output_file.jpg", resolution=300)

# Display the map in the notebook
map
Using ArcPy
  1. Ensure you are using the ArcGIS version of Jupyter notebook for ease of use. Go to Start > ArcGIS > Jupyter Notebook. This will negate the need to modify your system environment and paths.
  2. Import the necessary ArcPy modules in your Python notebook.
  3. Connect to your ArcGIS account or portal using appropriate credentials.
  4. Set the workspace to the location of your geospatial data.
  5. Create a map document object using the arcpy.mapping module.
  6. Add desired layers or data to the map.
  7. Customize the map properties, such as extent, scale, and symbology.
  8. Optionally, add labels, legends, or other cartographic elements.
  9. Export the map to a desired format or display it in the notebook.
  10. Save the map or share it with others through the ArcGIS platform.

Example code:

import arcpy
import arcpy.mapping as mapping

# Set the workspace to the location of your geospatial data
arcpy.env.workspace = "path_to_workspace"

# Create a new map document
mxd = mapping.MapDocument()

# Add layers or data to the map
df = mapping.ListDataFrames(mxd)[0]
layer1 = mapping.Layer("path_to_layer1")
layer2 = mapping.Layer("path_to_layer2")
mapping.AddLayer(df, layer1)
mapping.AddLayer(df, layer2)

# Customize map properties
df.zoomToSelectedFeatures()
df.legend.title = "Legend"
df.titleText = "My Map"

# Export the map
mapping.ExportToJPEG(mxd, "path_to_output_file.jpg", resolution=300)

# Display the map in the notebook
mxd
**Learn more about ArcPY


Common Open Source Visualization packages include:

  • Matplotlib
  • Seaborn
  • Plotly
  • Folium
  • GeoPandas
  • Bokeh
  • Basemap
  • Cartopy
  • Geoplot
  • PySAL

These libraries provide different levels of functionality and customization options, so you can choose the one that best fits your needs and preferences.