Jupyter Notebooks
Jupyter Notebooks provide a powerful and interactive environment for geospatial analysis. With Jupyter Notebooks, users can combine code, visualizations, and explanatory text in a single document. This allows for a seamless workflow where users can explore, analyze, and visualize geospatial data using Python or other programming languages. Jupyter Notebooks are particularly useful for iterative analysis, as they enable users to run and modify code cells in a flexible and interactive manner.
PyCharm
PyCharm is an integrated development environment (IDE) specifically designed for Python development. Geospatial analysts can utilize PyCharm to efficiently write, debug, and test their geospatial analysis scripts and applications. PyCharm provides a feature-rich environment with advanced code editing capabilities, such as code completion, syntax highlighting, and code navigation, which enhance productivity. PyCharm's debugging and profiling tools assist in identifying and resolving issues, ensuring the reliability of geospatial analysis workflows.
Visual Studio Code (VSCode)
Visual Studio Code, often referred to as VSCode, is a lightweight and versatile code editor that supports various programming languages, including Python. VSCode also offers a flexible and customizable user interface, allowing analysts to arrange their workspace according to their preferences. Its integrated terminal and debugging capabilities make it convenient for executing and troubleshooting geospatial analysis scripts.
Python
Geospatial analysis mainly uses Python for several reasons:
-
Rich ecosystem of geospatial libraries: Python has a vast collection of specialized geospatial libraries such as GeoPandas, Shapely, Fiona, and PySAL, which provide powerful tools for handling geospatial data, performing spatial operations, and conducting advanced geospatial analysis.
-
Integration with other data science and analysis libraries: Python's popularity in the data science community and its extensive ecosystem of data analysis libraries such as NumPy, Pandas, and Matplotlib make it an ideal choice for geospatial analysis. Python allows seamless integration of geospatial analysis with other data processing and visualization tasks.
-
Flexibility and versatility: Python is a versatile programming language known for its flexibility. It allows users to combine geospatial analysis with other functionalities, such as machine learning, statistical analysis, and web development. Python's flexibility enables the creation of custom workflows and tailored solutions for specific geospatial analysis needs.
-
Ease of use and readability: Python is renowned for its readability and user-friendly syntax. Its clear and concise code structure makes it easier for both beginners and experienced programmers to understand and write geospatial analysis scripts and workflows. Python's readability contributes to better collaboration and maintainability of geospatial projects.
-
Active community support: Python benefits from a large and active community of geospatial analysts, developers, and researchers who contribute to the development and improvement of geospatial libraries and tools. The availability of extensive documentation, tutorials, and online resources makes it easier for users to learn, troubleshoot, and get assistance when working on geospatial analysis projects.
While Python is widely used in geospatial analysis, it is important to note that other programming languages like R, Java, and C++ also have their own geospatial libraries and ecosystems. The choice of programming language ultimately depends on specific project requirements, personal preferences, and existing expertise.
Virtual Environments
Best Practice: Working with Conda Virtual Environments
Using virtual environments with Conda can help you create consistent, reproducible, and isolated environments for your projects, which can save time and prevent issues caused by conflicting dependencies or system-level changes.
For example, ArcGIS products function best with python 3.7. By creating a separate ArcGIS environment you can install 3.7 without causing conflicts with the modern python 3.10 release.
Create a new environment
example:Installing package(s)
Common Geo-Specific Packages available
This list is not exhaustive, but here are some of the common packages:
- GeoPandas
- Shapely
- Fiona
- GDAL/OGR
- PyProj
- Cartopy
- Rasterio
- Geoplot
- Basemap
- Bokeh
- PySAL
- Spatial Pandas
- NetworkX
- PyShp
- TileStache
- GdalUtils
- Scipy
- PyTopo
- Geopy
- Plotly
Conda
DAaaS utilizes Artifactory for package and library management:
To use:
Miniforge (conda) has been preconfigured to use the DAS Artifactiry.
You should not need to specify the channel.
If this fails, we have included examples on direct connections after the simple examples:
For specific versions
Connecting directly to the artifactory channel:
conda install -c https://jfrog.aaw.cloud.statcan.ca/artifactory/conda-forge-remote/ [package]
conda install -c https://jfrog.aaw.cloud.statcan.ca/artifactory/conda-forge-remote/ [package=X.X...]
Confirm your package installation
PIP
PIP has also been preconfigured to use the DAS artifactory custom index:
If that fails and you need to specify the index url:
Some Basic Examples
Connect to GAE ArcGIS Portal (Enterprise)
Your project group will be provided with a Client ID upon onboarding which will be used to connect to the ArcGIS Enterprise Portal. Paste the Client ID in-between the quotations
This will trigger a pop-up window to authenticate, then provide you with a key to enter into the IDEConvert a WFS into pandas DataFrame
import geopandas as gpd
# Set WFS URL and layer name
wfs_url = 'https://mywfs.com/wfs'
layer_name = 'my_layer'
# Read WFS into a geopandas dataframe
gdf = gpd.read_file(wfs_url, layer=layer_name)
# Convert geopandas dataframe to pandas dataframe
df = gdf.drop(columns='geometry')
# Preview the dataframe
print(df.head())
Shapefile to GeoDataFrame (Spatial DataFrame)
Export a GeoDataFrame to ArcGIS Enterprise
from arcgis.gis import GIS
import geopandas as gpd
# Define the URL of your ArcGIS Enterprise portal
portal_url = 'https://geoanalytics.cloud.statcan.ca/portal/'
# Create a connection to your portal
gis = GIS(portal_url, client_id='')
# Define the name of the feature layer to be created
layer_name = 'your_layer_name'
# Publish the GeoDataFrame to your portal as a feature layer
feature_layer = gis.content.import_data(gdf, title=layer_name)
# Print the URL of the feature layer
print(feature_layer.url)
Join CSV to SHP(as sdf) then Export to ArcGIS Enterprise
import pandas as pd
from arcgis.gis import GIS
from arcgis.features import SpatialDataFrame
# Load the CSV file into a Pandas dataframe
csv_df = pd.read_csv('path/to/csv_file.csv')
# Load the spatial data into a SpatialDataFrame using ArcGIS API for Python
sdf = SpatialDataFrame.from_featureclass('path/to/spatial_data.shp')
# Join the CSV dataframe to the spatial dataframe based on a common field
joined_sdf = sdf.merge(csv_df, on='common_field')
# Export the joined spatial dataframe to ArcGIS Enterprise using the ArcGIS API for Python
gis = GIS('https://geoanalytics.cloud.statcan.ca/portal/', client_id='')
joined_fc = joined_sdf.spatial.to_featureclass(location='path/to/output.gdb', overwrite=True)
joined_item = gis.content.add({'type': 'Feature Service', 'title': 'Joined Data', 'tags': 'Data'}, data=joined_fc)
joined_item.publish()
Finally, the joined SpatialDataFrame is exported to a feature class using the spatial.to_featureclass() method of the SpatialDataFrame, and then published to ArcGIS Enterprise using the gis.content.add() and publish() methods of the ArcGIS API for Python. Note that you will need to replace the example paths and server URL with the actual paths and URL for your data and ArcGIS Enterprise instance.
GeoCode a dataframe using OSM API
import requests
import pandas as pd
def geocode_address(address):
"""
Geocode a single address using the OpenStreetMap API
"""
url = "https://nominatim.openstreetmap.org/search"
params = {
"q": address,
"format": "json"
}
response = requests.get(url, params=params)
if response.ok:
results = response.json()
if len(results) > 0:
return results[0]
return None
def geocode_dataframe(df, address_column):
"""
Geocode a Pandas dataframe using the OpenStreetMap API
"""
# Create a new dataframe to store the geocoding results
geocoded_df = pd.DataFrame(columns=["latitude", "longitude"])
# Loop through each row in the original dataframe
for index, row in df.iterrows():
# Get the address from the specified column
address = row[address_column]
# Geocode the address using the OpenStreetMap API
result = geocode_address(address)
if result:
# Add the latitude and longitude to the new dataframe
geocoded_df.loc[index] = [result["lat"], result["lon"]]
else:
# If geocoding failed, add NaN values to the new dataframe
geocoded_df.loc[index] = [float("NaN"), float("NaN")]
# Add the new columns to the original dataframe
df["latitude"] = geocoded_df["latitude"]
df["longitude"] = geocoded_df["longitude"]
return df
Raster Analysis with GDAL
from osgeo import gdal
import numpy as np
# Open the raster file
raster_ds = gdal.Open('path/to/raster.tif')
# Read the raster band into a NumPy array
raster_band = raster_ds.GetRasterBand(1)
raster_array = raster_band.ReadAsArray()
# Perform some analysis on the raster data
# For example, calculate the mean pixel value
mean_value = np.mean(raster_array)
# Print the result
print('Mean pixel value: {}'.format(mean_value))
You can modify this code to perform other types of analysis on the raster data, such as calculating the minimum, maximum, or standard deviation of the pixel values, or performing calculations between multiple bands. GDAL provides a wide range of functions and tools for working with raster data, so the possibilities are nearly endless.
Learn More
Learn more about about Artifactory
Connecting to Spatial Data - GAE Enterprise Portal
The ArcGIS Enterprise Portal can be accessed in either the AAW or CAE using the API, from any service which leverages the Python programming language.
For example, in AAW and the use of Jupyter Notebooks within the space, or in CAE the use of Databricks, DataFactory, etc.
Connecting to GAE Portal using ArcGIS API
-
Install packages:
or using Artifactory
-
Import the necessary libraries that you will need in the Notebook.
-
Access the Portal Your project group will be provided with a Client ID upon onboarding. Paste the Client ID inbetween the quoatations
client_id='######'
. -
- The output will redirect you to a login Portal.
- Use the StatCan Azure Login option, and your Cloud ID
- After successful login, you will receive a code to sign in using SAML.
- Paste this code into the output.
Search for your Content
search() method The search() method is used to retrieve a collection of items that match specific search criteria. It allows you to search for items based on various parameters such as keywords, item types, owners, tags, groups, and more. The search() method returns a list of items that match the specified search criteria. This method is useful when you want to retrieve multiple items that meet certain conditions.
There are multiple ways to search for content depending on the amount of metadata you have filled out for your item. Learn more about .search method here
Search all of your items in the Portal
Search content by name
Search content by tag
Search content by group
# Specify the group ID or group name you want to search within
group_id = "your_group_id_or_name"
# Search for items within the specified group
search_results = gis.content.search(query="", inGroup=group_id)
# Iterate over the search results
for item in search_results:
print(f"Title: {item.title}, ID: {item.id}")
Get Content (fetch/retrieve)
get() method
The get() method is used to retrieve a specific item by its unique item ID. You provide the item ID as an argument to the get() method, and it returns the item with that particular ID. This method is useful when you already know the exact item ID and want to retrieve that specific item.
The efficent way to retrieve content is by using the items ID:
It is also possible to .get the content from a list created during searching (like above), however, this can become convoluted when conducting multiple searches, just be careful with your syntax.
Working with Spatial Data
The choice between ArcGIS and open-source tools for spatial data depends on your specific needs and available resources. ArcGIS offers specialized functionality and support, while open-source tools are free and customizable. Consider your requirements and available expertise to make an informed decision. In some cases, a hybrid approach may be suitable, where you can leverage the strengths of both ArcGIS and open-source tools depending on the task at hand.
Convert Feature Service to Spatially Enabled DataFrame (open source)
Conversion of an ArcGIS feature layer into a Pandas DataFrame with spatial capabilities using the pd.DataFrame.spatial.from_layer() method.
Convert Feature Service to GeoDataFrame (open source)
Conversion of an ArcGIS feature layer into a Pandas DataFrame with spatial capabilities using the pd.DataFrame.spatial.from_layer() method.
import geopandas as gpd
# Assuming you have the item ID of the feature service
item_id = "your_item_id"
# Get the feature service item
item = gis.content.get(item_id)
# Access the feature layer within the feature service
feature_layer = item.layers[0]
# Query the feature layer to retrieve all features
features = feature_layer.query().features
# Convert the features to a GeoDataFrame
gdf = gpd.GeoDataFrame.from_features(features)
Publish a Spatial DataFrame as a Feature Service to GAE Portal
#sdf = your spatial dataframe
item_properties = {'title': '<title name>', 'tags': '<tag>', 'description': '<this is my item description>'}
published_item = gis.content.import_data(sdf, item_properties=item_properties)
published_item.publish()
# Retrieve the item ID and URL of the published feature service
item_id = published_item.id
feature_service_url = published_item.url
# Print the item ID and URL
print("Item ID:", item_id)
print("Feature Service URL:", feature_service_url)
Visualize Your Data on an Interactive Map
To visualize the map widget within different python based tools, you may need to leverage tool-specific display functions and or widgets. For example, in Databricks use the %python magic command to switch to Python mode before creating and displaying the map widget.
ArcGIS Map Module
from IPython.display import display
# Retrieve the feature service item
item = gis.content.get("feature_service_item_id")
# Create a map widget
map_widget = MapView()
# Add the feature service layer to the map
map_widget.add_layer(item.layers[0])
# Display the map widget using Databricks-specific display function
display(map_widget)
MatplotLib Library
ipyleaflet Library
from ipyleaflet import Map, GeoData
# Assuming you have a GeoDataFrame called 'gdf'
# Create a map
m = Map(center=(gdf.geometry.centroid.y.mean(), gdf.geometry.centroid.x.mean()), zoom=10)
# Create a GeoData layer from the GeoDataFrame
geo_data = GeoData(geo_dataframe=gdf)
# Add the GeoData layer to the map
m.add_layer(geo_data)
# Display the map
m
Creating Visuals in Notebooks (python)
When creating map visuals in a notebook, you can choose between using the proprietary ArcGIS API for Python, open-source Python libraries like GeoPandas and Matplotlib, or adopting a hybrid approach that combines both. The ArcGIS API offers extensive geospatial capabilities and integration with ArcGIS products, while open-source libraries provide flexibility and community support. The hybrid approach allows you to leverage the strengths of both options based on your specific needs and preferences.
Using Open Source Methods
- Install the required Python packages for geospatial analysis, such as
geopandas
,folium
, ormatplotlib
. - Import the necessary modules in your Python notebook.
- Read or import the geospatial data into a suitable data structure.
- Visualize the data using the chosen package's mapping functions or classes.
- Customize the map properties, such as colors, symbology, or basemaps.
- Add additional layers or annotations to the map, if needed.
- Export the map to a desired format or display it in the notebook.
- Save the map or share it with others as an image or interactive HTML file.
Example code using geopandas
and matplotlib
:
import geopandas as gpd
import matplotlib.pyplot as plt
# Read the geospatial data
data = gpd.read_file("path_to_shapefile.shp")
# Visualize the data on a map
data.plot()
# Customize the map properties
plt.title("My Geospatial Map")
plt.xlabel("Longitude")
plt.ylabel("Latitude")
# Export the map
plt.savefig("path_to_output_file.png", dpi=300)
# Display the map in the notebook
plt.show()
Using ArcGIS API Method
Using ArcGIS Paid Product
- Import the necessary ArcGIS modules in your Python notebook.
- Connect to your ArcGIS account or portal using appropriate credentials.
- Create a map object using the
arcgis.mapping
module. - Add desired layers or data to the map.
- Customize the map properties, such as extent, scale, and symbology.
- Optionally, add labels, legends, or other cartographic elements.
- Export the map to a desired format or display it in the notebook.
- Save the map or share it with others through the ArcGIS platform.
Example code:
import arcpy
from arcgis.gis import GIS
import arcgis.mapping as mapping
# Connect to your ArcGIS account or portal
gis = GIS("https://geoanalyticsdev.cloud.statcan.ca/portal", client_id=' ')
# Authentication pop-up will open
# Create a new map
map = mapping.Map()
# Add layers or data to the map
map.add_layer("path_to_layer1")
map.add_layer("path_to_layer2")
# Customize map properties
map.zoom_to_layer("path_to_layer1")
map.legend = True
# Export the map
map.export("path_to_output_file.jpg", resolution=300)
# Display the map in the notebook
map
Using ArcPy
- Ensure you are using the ArcGIS version of Jupyter notebook for ease of use. Go to Start > ArcGIS > Jupyter Notebook. This will negate the need to modify your system environment and paths.
- Import the necessary ArcPy modules in your Python notebook.
- Connect to your ArcGIS account or portal using appropriate credentials.
- Set the workspace to the location of your geospatial data.
- Create a map document object using the
arcpy.mapping
module. - Add desired layers or data to the map.
- Customize the map properties, such as extent, scale, and symbology.
- Optionally, add labels, legends, or other cartographic elements.
- Export the map to a desired format or display it in the notebook.
- Save the map or share it with others through the ArcGIS platform.
Example code:
import arcpy
import arcpy.mapping as mapping
# Set the workspace to the location of your geospatial data
arcpy.env.workspace = "path_to_workspace"
# Create a new map document
mxd = mapping.MapDocument()
# Add layers or data to the map
df = mapping.ListDataFrames(mxd)[0]
layer1 = mapping.Layer("path_to_layer1")
layer2 = mapping.Layer("path_to_layer2")
mapping.AddLayer(df, layer1)
mapping.AddLayer(df, layer2)
# Customize map properties
df.zoomToSelectedFeatures()
df.legend.title = "Legend"
df.titleText = "My Map"
# Export the map
mapping.ExportToJPEG(mxd, "path_to_output_file.jpg", resolution=300)
# Display the map in the notebook
mxd
Common Open Source Visualization packages include:
- Matplotlib
- Seaborn
- Plotly
- Folium
- GeoPandas
- Bokeh
- Basemap
- Cartopy
- Geoplot
- PySAL
These libraries provide different levels of functionality and customization options, so you can choose the one that best fits your needs and preferences.