banffprocessor.util package#
Submodules#
banffprocessor.util.case_insensitive_enum_meta module#
Metaclass for a case-insensitive version of an Enum.
banffprocessor.util.dataset module#
Model for a Banff Processor working dataset.
- class banffprocessor.util.dataset.Dataset(name: str, ds: Table, dbconn: DuckDBPyConnection | None = None)[source]#
Bases:
object
Model for a Banff Processor working dataset.
- property dbconn: DuckDBPyConnection | None#
Return a connection to the database being used.
- property ds: Table#
Return the dataset as an Arrow Table.
- property ds_curr_output: Table#
Return the dataset output by the current proc as an Arrow table.
- property ds_filtered: Table#
Return the filtered version of the dataset as an Arrow table.
- name: str#
- banffprocessor.util.dataset.add_single_value_column(table: Table, column_name: str, value: Any, dtype: DataType | None = None) Table [source]#
Add a new column to table where every row of the column contains the same value and the column is the same length as table.
- Parameters:
table (pa.Table) – The table to append the new column to
column_name (str) – The name for the new column
value (Any) – The value to use for each row of the new column
dtype (pa.DataType, optional) – The PyArrow DataType to use for the columnc, defaults to None
- Returns:
table with the new column appended to it
- Return type:
pa.Table
- banffprocessor.util.dataset.copy_table(to_copy: Table) Table [source]#
Return a copy of a PyArrow Table.
- Parameters:
to_copy (pa.Table) – The table to make a copy of
- Returns:
A new table containing the data and metadata of to_copy
- Return type:
pa.Table
- banffprocessor.util.dataset.get_dataset_name_alias(name: str) str | None [source]#
Get the alias name of the given dataset name.
Casefold() name and get the aliased name of the dataset name, if name exists with an alias. If no alias exists, None is returned.
- Parameters:
name (str) – The name of the dataset to get the aliased name of
- Returns:
name casefolded, or the proper dataset name if name is an alias
- Return type:
str
- banffprocessor.util.dataset.get_dataset_real_name(name: str) str [source]#
Get the real name of the given dataset name.
Casefold() name and get the actual name of the dataset name, if name exists as an alias. If no alias exists just returns the casefolded name.
- Parameters:
name (str) – The name of the dataset to get the proper name of
- Returns:
name casefolded, or the proper dataset name if name is an alias
- Return type:
str
banffprocessor.util.metadata_excel_to_xml module#
A program to convert Banff excel spreadsheet to XML files.
Statistics Canada 2024 K Williamson
- banffprocessor.util.metadata_excel_to_xml.convert_excel_to_xml(in_file: str, out_dir: str | None = None) None [source]#
Create XML files from an Excel file that was created using the Banff Processor Metadata Template.
In order to facilitate the creation of Banff Processor metadata, users can create the metadata in Excel using the Banff Processor metadata template. This function reads in the Excel file indicated by the infile parameter and writes XML files to the directory indicated in out_dir.
- banffprocessor.util.metadata_excel_to_xml.get_args(args: list | str | None = None) ArgumentParser [source]#
Create an argument parser.
Example args -> [“my_filename.xlsx”, “-o”, “/my/out/folder”, “-l”, “fr”]