banffprocessor.metadata package#

Subpackages#

Submodules#

banffprocessor.metadata.metaobjects module#

class banffprocessor.metadata.metaobjects.MetaObjects(metadata_folder: str | Path | None = None, job_id: str | None = None, dbconn: DuckDBPyConnection | None = None)[source]#

Bases: object

Container class for collections of metadata objects.

add_objects_of_single_type(objects: list[MetadataClass]) None[source]#

Add a list of metadata objects all of the same type to the MetaObjects collection which can be retrieved using their type.

Only one list per object type can be added; if a second is added it will overwrite the original list stored under that type.

Parameters:

objects (list[src.banffprocessor.metadata.MetadataClass]) – The list of metadata objects to load

Raises:
  • ValueError – If objects is empty or None

  • TypeError – If objects contains objects of more than one type

check_constraints() None[source]#

Perform the check_constraints method on each metadata class type loaded to this object.

cleanup_metadata() None[source]#

Perform the cleanup() method on each metadata class type loaded to this object.

property dbconn: DuckDBPyConnection | None#

The currently connected database used to store metadata objects.

Returns:

The duckdbpyconnection currently being used to store metadata objects.

Return type:

duckdb.DuckDBPyConnection | None

display_load_summary() None[source]#

Display a summary of the Metadata files that were loaded to memory.

get_algorithm(algorithmname: str) Algorithms | None[source]#

Get and return the src.banffprocessor.metadata.Algorithms object associated with the specified algorithmname.

Parameters:

algorithmname – The algorithmname of the src.banffprocessor.metadata.Algorithms object to retrieve

Returns:

The src.banffprocessor.metadata.Algorithms object with the specified algorithmname

Return type:

src.banffprocessor.metadata.Algorithms | None

get_edits_string(editgroupid: str) str[source]#

Get and return a string containing the list of edits in the src.banffprocessor.metadata.Edits objects associated with the specified editgroupid.

The string is formed by concatenating the formed edit strings from the edits objects with a semi-colon and space as well as prepending each edit with its modifier (if present) and a colon and space. If no edits are found under the editgroupid, an empty string is returned.

i.e. “PASS: a > b; FAIL: c + d <= e; f - g = h;”

Parameters:

editgroupid (str) – the ID to filter the src.banffprocessor.metadata.Editgroups on

Returns:

The semi-colon separated list of formed edits as a single string, empty if no edits were found

Return type:

str

get_estimators(estid: str) list[Estimators][source]#

Get and return the list of src.banffprocessor.metadata.Estimators objects associated with the specified estid and sorted by their seqno.

If no variables are found under the estid, an empty list is returned.

Parameters:

estid (str) – the ID to filter the src.banffprocessor.metadata.Estimators on

Returns:

A list of src.banffprocessor.metadata.Estimators objects or an empty list if no src.banffprocessor.metadata.Estimators are found under the estid

Return type:

list[src.banffprocessor.metadata.Estimators]

get_expression(exprid: str) str[source]#

Get the expression string associated with the specified exprid.

Parameters:

exprid (str) – The identifier of the Expression to get.

Returns:

The expressions field value of the Expression object fetched.

Return type:

str

get_job_steps(jobid: str | None) list[Jobs][source]#

Get and returns the list of Jobs objects with jobid sorted in ascending order of their seqno.

If no objects are found under the jobid an empty list is returned.

Parameters:

jobid (str | None) – The class reference of the object type to fetch

Returns:

A list of the src.banffprocessor.metadata.Jobs objects with jobid jobid

Return type:

list[src.banffprocessor.metadata.Jobs]

get_objects_of_type(cls: type[MetadataClass]) list[MetadataClass] | dict[str, Any][source]#

Get the list of metadata objects of type cls.

If no objects are found, an empty list is returned.

Parameters:

cls (type[src.banffprocessor.metadata.MetadataClass]) – The class reference of the object type to fetch

Returns:

A list of all type cls objects found in this MetaObjects object or a special dictionary if objects are of type src.banffprocessor.metadata.ProcessControls

Return type:

list[src.banffprocessor.metadata.MetadataClass] | dict[str, Any]

get_process_controls(controlid: str) dict[str, dict[ProcessControlType, list[ProcessControls]]][source]#

Get and return a mapping of targetfile names to a dict of parameter values to their list of src.banffprocessor.metadata.ProcessControls objects associated with the specified controlid.

The lists are sorted on the enum value of the parameter field to ensure that a regular list traversal will always pass over controls in the order that they should be applied to the target_file. If no variables are found under the controlid, an empty dict is returned.

i.e. {

“indata”: {

ProcessControlType.ROW_FILTER: [ProcessControls1, ProcessControls2], ProcessControlType.COLUMN_FILTER: [ProcessControls3, ProcessControls4],

},

}

Parameters:

controlid (str) – the ID to filter the src.banffprocessor.metadata.ProcessControls on

Returns:

A dict of target file names mapped to dicts of src.banffprocessor.metadata.ProcessControlType mapped to lists of src.banffprocessor.metadata.ProcessControls of that type for that targetfile, or an empty list if no records are found under the controlid

Return type:

dict[str, src.banffprocessor.metadata.ProcessControls]

get_process_outputs(process: str) list[str][source]#

Get and return the list of output_name strings for process. If no objects are found under process an empty list is returned.

Parameters:

process (str) – The name of the process value to retrieve records of

Returns:

A list of the output_name attributes of the banffprocessor.metadata.ProcessOutputs objects with process process

Return type:

list[str]

get_specs_obj(cls: type[MetadataClass], specid: str) MetadataClass[source]#

Get and return the object of type cls with the specid specid.

Only one result should be found for the specified specid as it is effectively a primary key for its metadata table. If no object is found for the specid None is returned.

Parameters:
  • cls (type[src.banffprocessor.metadata.MetadataClass]) – The metadata object type to search for

  • specid (str) – The specid to match on objects of type cls

Raises:

MetadataConstraintError – If multiple src.banffprocessor.metadata.MetadataClass objects are found under specid

Returns:

The object with type cls and specid specid or None if not found

Return type:

src.banffprocessor.metadata.MetadataClass

get_user_vars_dict(specid: str, process: str) dict[str, str][source]#

Get Uservars objects identified by the specid and process and return a dict mapping the Uservars var to its value.

Parameters:
  • specid (str) – The specid identifying Uservars to fetch.

  • process (str) – The process value of the Uservars to fetch.

Returns:

A dictionary mapping the fetched Uservars var field to their value

Return type:

dict[str,str]

get_varlist_fieldids(varid: str | None) list[str][source]#

Given a list of varlist objects gets and returns the list of fieldids of the varlist objects associated with the specified varid and sorts it on seqno.

If no variables are found under the varid, an empty list is returned.

Parameters:

varid (str | None) – the ID to filter the varlists on

Returns:

A list of varlist objects with varid varid sorted on their seqno and an empty list if no objects are found

Return type:

list[str]

get_weights_string(weightid: str) str[source]#

Get and return a string containing the list of weights in the src.banffprocessor.metadata.Weights objects associated with the specified weightid sorted in descending order by weight, and formed by concatenating the formed weight strings from the objects with a semi-colon and space.

If no src.banffprocessor.metadata.Weights are found under the weightid, an empty string is returned.

i.e. “field1=9.0; field2=7.0; field3=5.0;”

Parameters:

weightid (str) – the ID to filter the src.banffprocessor.metadata.Weights on

Returns:

The semi-colon separated list of formed weights as a single string, empty if no src.banffprocessor.metadata.Weights were found

Return type:

str

initialize_metadata() None[source]#

Perform the initialize() method on each metadata class type loaded to this object.

job_proc_names: set[str]#
load_xml_file(metadata_file: Path, cls: type[MetadataClass]) None[source]#

Load the metadata found in metadata_file into this class:src.banffprocessor.metadata.MetaObjects object.

The new entry added will have a key of the cls name and value of a collection of objects of type cls.

Parameters:
  • metadata_file (pathlib.Path) – The path of the XML file to load

  • cls (type[src.banffprocessor.metadata.MetadataClass]) – The metadata object type to load the file into

Raises:
total_job_steps: int#
static validate_job_sequence(job_steps: list[Jobs], job_id: str | None = None) tuple[set[str], int][source]#

Iterate through job_steps and validates the sequence of all steps and process blocks contained/referenced in the job with job_id.

If job_id is not provided, the first job found in job_steps will be used as the starting point. Returns a list of the unique proc names contained in the job sequence.

Parameters:
  • job_steps (list[Jobs]) – A collection of Jobs metadata objects

  • job_id (str | None, optional) – The job_id to be run and whose job steps to validate, defaults to None

Raises:
  • MetadataConstraintError – If a job step of process “JOB” has a specid pointing to a job_id that does not exist in the current Jobs metadata collection

  • MetadataConstraintError – If a cycle exists in the graph of job_steps (i.e. a step points to a process block which points back to the calling block, thus creating an infinite loop)

Returns:

A set of the unique proc names contained in the job sequence and the total number of job steps across the entire job.

Return type:

tuple[set[str], int]

Module contents#

Contains the modules used to interact and store Banff Processor metadata.