Banff Package SAS Migration Guide#

In this guide#

The target audience for this guide is users of SAS based Banff 2.08.002 or earlier who are migrating to Python based Banff 3.x.
It summarizes information on the changes made in version 3, including new names for parameters and tables, as well as examples of how SAS programs which call Banff procedures can be converted into equivalent Python programs.

This document is not intended for new Banff users as it does not provide an extensive overview of the Banff Procedures. Please refer to the User Guide for complete details on the use of each procedure, its parameters, and tables.

Table of contents#

Available Procedures
Procedure Parameters
- Table of Procedure Parameters and Types
- New Procedure Options
Procedure Tables
- Table of Procedure Table Identifiers
- Table Changes
Table Specification
Notable Procedure Changes
- Input Table Changes
- Output Table Changes
Other Python Runtime Options
Performance Considerations
Errors and Exceptions
Utility Functions

Available Procedures#

For a list of the available procedures, see the User Guide.

Each procedure has been converted by taking the SAS-dependent Banff 2.08.002 procedure source code and modifying it to produce an open-source based procedure which is “wrapped” in a Python package. The underlying mathematical computations remain unaltered.

Due to differences between SAS and Python, users must adapt how they specify parameters and tables; the sets of parameters and tables remains largely unchanged (although most parameter identifiers and table identifiers have changed).

Procedure Parameters#

Many parameter names have changed to better follow common Python naming conventions.

The identifiers used in SAS programs correspond to the following identifiers and Python types:

Table of Procedure Parameters and Types#

SAS Identifier	Python Identifer	Python Type	Note
`ACCEPTNEGATIVE`	`accept_negative`	`bool`
`ACCEPTZERO`	`accept_zero`	`bool`
`BETAE`	`beta_e`	`int` or `float`
`BETAI`	`beta_i`	`int` or `float`
`BOUNDSTAT`	`outlier_stats`	`bool`	MODIFIED: now relates to contents of `outstatus_detailed` table
`BY`	`by`	`str`
`CARDINALITY`	`cardinality`	`int` or `float`
`DATAEXCLVAR`	`data_excl_var`	`str`
`DECIMAL`	`decimal`	`int` or `float`
`DISPLAYLEVEL`	`display_level`	`int` or `float`
`EDITS`	`edits`	`str`
`ELIGDON`	`eligdon`	`str`
`EXPONENT`	`exponent`	`int` or `float`
`EXTREMAL`	`extremal`	`int` or `float`
`HISTEXCLVAR`	`hist_excl_var`	`str`
`ID`	`unit_id`	`str`
`IMPLY`	`imply`	`int` or `float`
`LOWERBOUND`	`lower_bound`	`int` or `float`
~~`MATCHFIELDSTAT`~~	~~`match_field_stat`~~	~~`bool`~~	DEPRECATED: use `outmatching_fields` output table instead
`MAXCARDINALITY`	`extremal`	`int` or `float`
`MAXIMPLIEDEDITS`	`imply`	`int` or `float`
`MDM`	`mdm`	`int` or `float`
`MEI`	`mei`	`int` or `float`
`METHOD`	`method`	`str`
`MII`	`mii`	`int` or `float`
`MINDONORS`	`min_donors`	`int` or `float`
`MINOBS`	`min_obs`	`int` or `float`
`MODIFIER`	`modifier`	`str`
`MRL`	`mrl`	`int` or `float`
`MUSTIMPUTE`	`must_impute`	`str`
`MUSTMATCH`	`must_match`	`str`
`N`	`n`	`int` or `float`
`NLIMIT`	`n_limit`	`int` or `float`
`NOBYSTATS`	`no_by_stats`	`bool`
`OUTLIERSTAT`	`outlier_stats`	`bool`	MODIFIED: now relates to contents of `outstatus_detailed` table
`PCENTDONORS`	`percent_donors`	`int` or `float`
`POSTEDITS`	`post_edits`	`str`
`RANDNUMVAR`	`rand_num_var`	`str`
`RANDOM`	`random`	`bool`
~~`REJECTNEGATIVE`~~	~~`reject_negative`~~	~~`bool`~~	DEPRECATED: specify `accept_negative=False` instead
~~`REJECTZERO`~~	~~`reject_zero`~~	~~`bool`~~	DEPRECATED: specify `accept_zero=False` instead
`SEED`	`seed`	`int` or `float`
`SIDE`	`side`	`str`
`SIGMA`	`sigma`	`str`
`STARTCENTILE`	`start_centile`	`int` or `float`
`TIMEPEROBS`	`time_per_obs`	`int` or `float`
`UPPERBOUND`	`upper_bound`	`int` or `float`
`VAR`	`var`	`str`
`VERIFYEDITS`	`verify_edits`	`bool`
`VERIFYSPECS`	`verify_specs`	`bool`
`WEIGHT`	`weight`	`str`
`WEIGHTS`	`weights`	`str`
`WITH`	`with_var`	`str`

Example of Parameter Specification in Python#

The following code demonstrates how to specify different Python types associated with some common parameters.

foo = banff.donorimp(
    min_donors=2,
    percent_donors=0.1,
    accept_negative=True,
    edits="""x1>=-5; 
    x1<=15; 
    x2>=30; 
    x1+x2<=50;""",
    by="province city",
    unit_id='IDENT',
    trace=True,
    # ... etc. (tables)
)

Sample SAS Parameter Speficiation

proc donorimputation
    /* etc. (tables) ... */
    mindonors=2
    pcentdonors=0.1
    acceptnegative
    edits="x1>=-5; 
    x1<=15; 
    x2>=30; 
    x1+x2<=50;"
    ;
    by province city;
    id IDENT;
run;

parameter	note
`unit_id`	A single variable name
`by`	a list of 0 or more space-separated variable names
`cardinality`	excepts a number, see user guide for advice
`time_per_obs`	expects a number, see user guide for advice
`accept_negative`	boolean: `True`-> specified, <not-specified> -> not specified
`edits`	wrap multi-line strings with triple quotes (`"""edit>=string"""`)

New Procedure Options#

New options include

exclude_where_indata
exclude_where_indata_hist
prefill_by_vars
presort
trace

`exclude_where_indata` Option#

This option excludes records based on a user-specified SQL expression. See the procedure-specific documentation for details

`exclude_where_indata_hist` Option#

This option excludes records based on a user-specified SQL expression. See the procedure-specific documentation for details

estimator

`prefill_by_vars` Option#

This option is available and enabled by default in all procedures which accept an instatus table.
See User Guide

`presort` Option#

This option is available and enabled by default in all procedures which accept input tables.
See User Guide

`trace` Option#

This option is available in all procedures and controls console log verbosity.
See User Guide

Procedure Tables#

Many table parameter names have changed to better follow common Python naming conventions.

The identifiers used in SAS programs correspond to the following identifiers in Python:

Table of Procedure Table Identifiers#

SAS identifier	Python identifer	Note
`ALGORITHM`	`inalgorithm`
`AUX`	`indata_hist`	no “AUX” option in Python, use `indata_hist` instead
`DATA`	`indata`
`DATASTATUS`	`instatus`
`DONORMAP`	`outdonormap`
`ESTIMATOR`	`inestimator`
`HIST`	`indata_hist`
`HISTSTATUS`	`instatus_hist`
`INSTATUS`	`instatus`	new: now accepted by `errorloc` procedure, see Addition of instatus to Errorloc
`OUT`	`outdata`
`OUTACCEPTABLE`	`outacceptable`
`OUTDATA`	`outdata`
`OUTEDITAPPLIC`	`outedit_applic`
`OUTEDITSTATUS`	`outedit_status`
`OUTGLOBALSTATUS`	`outglobal_status`
`OUTKEDITSSTATUS`	`outk_edits_status`
`OUTRANDOMERROR`	`outrand_err`
`OUTREDUCEDEDITS`	`outedits_reduced`
`OUTESTEF`	`outest_ef`
`OUTESTLR`	`outest_lr`
`OUTESTPARMS`	`outest_parm`
`OUTVARSROLE`	`outvars_role`
`OUTREJECT`	`outreject`
`OUTSTATUS`	`outstatus`	new: now produced by the `massimp` procedure, see Addition of outstatus to Mass Imputation
`OUTSUMMARY`	`outsummary`
`STATUS`	`instatus`
Added in Banff 3.1.1	`outstatus_detailed`	produced by `outlier`, see Changes to Outlier Outstatus
Added in Banff 3.1.1	`outmatching_fields`	produced by `donorimp`, see Changes to Donor Imputation Outstatus

Table Changes#

An overview of changes is provided in the following subsections, with links to detailed information.
For a complete list of detailed information, see Notable Procedure Changes.

Removal of `by` Variables#

by variables have been removed from many output tables. See BY Variables on Output Tables for more details.

Standardization of `outstatus` Table#

also see Output Tables > outstatus

The outstatus table is now standardized across all procedures which produce it. New output tables have been introduced to the Donor Imputation, Mass Imputation and Outlier procedures to accommodate this change.

Addition of `instatus` Table#

The Errorloc procedure now accepts an instatus table.

Table Specification#

For information on specifying input and output tables, supported formats, etc. please see the User Guide

Notable Procedure Changes#

Input Table Changes#

Addition of `instatus` to Errorloc Procedure#

Errorloc now accepts an instatus table.

The Errorloc procedure processes its instatus table somewhat differently than other procedures. To favour selecting fields flagged for imputation, for each row in instatus with a status flag of FTI, the corresponding value in indata will be treated as if it were missing.

Output Table Changes#

BY Variables on Output Tables#

In the SAS based procedures, by variables were included on many output tables essentially by default whenever by-group processing was performed. In Banff 3.1.1 however, by variables are only ever included on the following tables:

Estimator
- outacceptable
- outest_ef
- outest_lr
- outest_parm
- outrand_err
Outlier
- outsummary

Standardization of `outstatus` Tables#

All outstatus tables are now standardized and contain exactly the following columns

Column Name	Note
`<unit-id>`	column named after the `unit_id` column from `indata` table
`FIELDID`	name of the column to which the status applies
`STATUS`	the status code
`VALUE`	value of the variable to which the status applies*

*VALUE column For procedures which produce an outdata table, the value is sourced from there.
Otherwise, the value is sourced from the indata table.

Furthermore, non-status information formerly produced by the Donor Imputation and Outlier procedures has been removed. See below for more information.

Changes to Donor Imputation `outstatus`#

Donor Imputation now produces a standardized outstatus table. Data which has been removed from outstatus is available in new optional output table outmatching_fields.
This new table is disabled by default. Specify True, or any valid output option to enable it. This new table replaces the match_field_stat option, which is now deprecated.

Addition of `outstatus` to Mass Imputation procedure#

Mass Imputation produces an outstatus table with the flag IMAS.

Changes to Outlier `outstatus`#

Outlier now produces a standardized outstatus table. Data which has been removed from outstatus is available in new optional table outstatus_detailed. This new table contains the variables <unit_id>, FIELDID, OUTLIER_STATUS (formerly OUTSTATUS, not to be confused with the oustatus table’s status variable), and any variables enabled by specifying outlier_stats=True. The table is enabled by default. Specify False to disable it.

Other Python Runtime Options#

Native Language Support#

Banff produces a log which can output messages in either English or French. See setting the log language from the user guide for details.

`capture` option#

When running in Jupyter Notebooks, some log messages may be missing. Specifying capture=True in a procedure call to may fix the issue. See suppressing and troubleshooting log messages from the user guide for details.

Performance Considerations#

Certain options and table formats can be expected to deliver optimal performance. See Performance Considerations for details.

Errors and Exceptions#

Error’s are handled differently in SAS vs in Python, where they are called exceptions. See Errors and Exceptions from the user guide for details.

Utility Functions#

Working with SAS Files in Python#

The banff package provides a few useful functions for reading SAS files in Python. See Working with SAS Files in Python in the user guide for details.