Python
Geospatial analysis mainly uses Python for several reasons:
-
Rich ecosystem of geospatial libraries: Python has a vast collection of specialized geospatial libraries such as GeoPandas, Shapely, Fiona, and PySAL, which provide powerful tools for handling geospatial data, performing spatial operations, and conducting advanced geospatial analysis.
-
Integration with other data science and analysis libraries: Python's popularity in the data science community and its extensive ecosystem of data analysis libraries such as NumPy, Pandas, and Matplotlib make it an ideal choice for geospatial analysis. Python allows seamless integration of geospatial analysis with other data processing and visualization tasks.
-
Flexibility and versatility: Python is a versatile programming language known for its flexibility. It allows users to combine geospatial analysis with other functionalities, such as machine learning, statistical analysis, and web development. Python's flexibility enables the creation of custom workflows and tailored solutions for specific geospatial analysis needs.
-
Ease of use and readability: Python is renowned for its readability and user-friendly syntax. Its clear and concise code structure makes it easier for both beginners and experienced programmers to understand and write geospatial analysis scripts and workflows. Python's readability contributes to better collaboration and maintainability of geospatial projects.
-
Active community support: Python benefits from a large and active community of geospatial analysts, developers, and researchers who contribute to the development and improvement of geospatial libraries and tools. The availability of extensive documentation, tutorials, and online resources makes it easier for users to learn, troubleshoot, and get assistance when working on geospatial analysis projects.
While Python is widely used in geospatial analysis, it is important to note that other programming languages like R, Java, and C++ also have their own geospatial libraries and ecosystems. The choice of programming language ultimately depends on specific project requirements, personal preferences, and existing expertise.
Virtual Environments
Best Practice: Working with Conda Virtual Environments
Using virtual environments with Conda can help you create consistent, reproducible, and isolated environments for your projects, which can save time and prevent issues caused by conflicting dependencies or system-level changes.
For example, ArcGIS products function best with python 3.7. By creating a separate ArcGIS environment you can install 3.7 without causing conflicts with the modern python 3.10 release.
Create a new environment
example:Installing package(s)
Common Geo-Specific Packages available
This list is not exhaustive, but here are some of the common packages:
- GeoPandas
- Shapely
- Fiona
- GDAL/OGR
- PyProj
- Cartopy
- Rasterio
- Geoplot
- Basemap
- Bokeh
- PySAL
- Spatial Pandas
- NetworkX
- PyShp
- TileStache
- GdalUtils
- Scipy
- PyTopo
- Geopy
- Plotly
Conda
DAaaS utilizes Artifactory for package and library management:
To use:
Miniforge (conda) has been preconfigured to use the DAS Artifactiry.
You should not need to specify the channel.
If this fails, we have included examples on direct connections after the simple examples:
For specific versions
Connecting directly to the artifactory channel:
conda install -c https://jfrog.aaw.cloud.statcan.ca/artifactory/conda-forge-remote/ [package]
conda install -c https://jfrog.aaw.cloud.statcan.ca/artifactory/conda-forge-remote/ [package=X.X...]
Confirm your package installation
Conda Cheat Sheet
Link to full Conda cheat sheet
PIP
PIP has also been preconfigured to use the DAS artifactory custom index:
If that fails and you need to specify the index url:
Some Basic Examples
Connect to GAE ArcGIS Portal (Enterprise)
Your project group will be provided with a Client ID upon onboarding which will be used to connect to the ArcGIS Enterprise Portal. Paste the Client ID in-between the quotations
This will trigger a pop-up window to authenticate, then provide you with a key to enter into the IDEConvert a WFS into pandas DataFrame
import geopandas as gpd
# Set WFS URL and layer name
wfs_url = 'https://mywfs.com/wfs'
layer_name = 'my_layer'
# Read WFS into a geopandas dataframe
gdf = gpd.read_file(wfs_url, layer=layer_name)
# Convert geopandas dataframe to pandas dataframe
df = gdf.drop(columns='geometry')
# Preview the dataframe
print(df.head())
Shapefile to GeoDataFrame (Spatial DataFrame)
Export a GeoDataFrame to ArcGIS Enterprise
from arcgis.gis import GIS
import geopandas as gpd
# Define the URL of your ArcGIS Enterprise portal
portal_url = 'https://geoanalytics.cloud.statcan.ca/portal/'
# Create a connection to your portal
gis = GIS(portal_url, client_id='')
# Define the name of the feature layer to be created
layer_name = 'your_layer_name'
# Publish the GeoDataFrame to your portal as a feature layer
feature_layer = gis.content.import_data(gdf, title=layer_name)
# Print the URL of the feature layer
print(feature_layer.url)
Join CSV to SHP(as sdf) then Export to ArcGIS Enterprise
import pandas as pd
from arcgis.gis import GIS
from arcgis.features import SpatialDataFrame
# Load the CSV file into a Pandas dataframe
csv_df = pd.read_csv('path/to/csv_file.csv')
# Load the spatial data into a SpatialDataFrame using ArcGIS API for Python
sdf = SpatialDataFrame.from_featureclass('path/to/spatial_data.shp')
# Join the CSV dataframe to the spatial dataframe based on a common field
joined_sdf = sdf.merge(csv_df, on='common_field')
# Export the joined spatial dataframe to ArcGIS Enterprise using the ArcGIS API for Python
gis = GIS('https://geoanalytics.cloud.statcan.ca/portal/', client_id='')
joined_fc = joined_sdf.spatial.to_featureclass(location='path/to/output.gdb', overwrite=True)
joined_item = gis.content.add({'type': 'Feature Service', 'title': 'Joined Data', 'tags': 'Data'}, data=joined_fc)
joined_item.publish()
Finally, the joined SpatialDataFrame is exported to a feature class using the spatial.to_featureclass() method of the SpatialDataFrame, and then published to ArcGIS Enterprise using the gis.content.add() and publish() methods of the ArcGIS API for Python. Note that you will need to replace the example paths and server URL with the actual paths and URL for your data and ArcGIS Enterprise instance.
GeoCode a dataframe using OSM API
import requests
import pandas as pd
def geocode_address(address):
"""
Geocode a single address using the OpenStreetMap API
"""
url = "https://nominatim.openstreetmap.org/search"
params = {
"q": address,
"format": "json"
}
response = requests.get(url, params=params)
if response.ok:
results = response.json()
if len(results) > 0:
return results[0]
return None
def geocode_dataframe(df, address_column):
"""
Geocode a Pandas dataframe using the OpenStreetMap API
"""
# Create a new dataframe to store the geocoding results
geocoded_df = pd.DataFrame(columns=["latitude", "longitude"])
# Loop through each row in the original dataframe
for index, row in df.iterrows():
# Get the address from the specified column
address = row[address_column]
# Geocode the address using the OpenStreetMap API
result = geocode_address(address)
if result:
# Add the latitude and longitude to the new dataframe
geocoded_df.loc[index] = [result["lat"], result["lon"]]
else:
# If geocoding failed, add NaN values to the new dataframe
geocoded_df.loc[index] = [float("NaN"), float("NaN")]
# Add the new columns to the original dataframe
df["latitude"] = geocoded_df["latitude"]
df["longitude"] = geocoded_df["longitude"]
return df
Raster Analysis with GDAL
from osgeo import gdal
import numpy as np
# Open the raster file
raster_ds = gdal.Open('path/to/raster.tif')
# Read the raster band into a NumPy array
raster_band = raster_ds.GetRasterBand(1)
raster_array = raster_band.ReadAsArray()
# Perform some analysis on the raster data
# For example, calculate the mean pixel value
mean_value = np.mean(raster_array)
# Print the result
print('Mean pixel value: {}'.format(mean_value))
You can modify this code to perform other types of analysis on the raster data, such as calculating the minimum, maximum, or standard deviation of the pixel values, or performing calculations between multiple bands. GDAL provides a wide range of functions and tools for working with raster data, so the possibilities are nearly endless.