Banff Package SAS Migration Guide#
In this guide#
The target audience for this guide is users of SAS based Banff 2.08.002 or earlier who are migrating to Python based Banff 3.x.
It summarizes information on the changes made in version 3, including new names for parameters and tables, as well as examples of how SAS programs which call Banff procedures can be converted into equivalent Python programs.
This document is not intended for new Banff users as it does not provide an extensive overview of the Banff Procedures. Please refer to the User Guide for complete details on the use of each procedure, its parameters, and tables.
Table of contents#
Available Procedures#
For a list of the available procedures, see the User Guide.
Each procedure has been converted by taking the SAS-dependent Banff 2.08.002 procedure source code and modifying it to produce an open-source based procedure which is “wrapped” in a Python package. The underlying mathematical computations remain unaltered.
Due to differences between SAS and Python, users must adapt how they specify parameters and tables; the sets of parameters and tables remains largely unchanged (although most parameter identifiers and table identifiers have changed).
Procedure Parameters#
Many parameter names have changed to better follow common Python naming conventions.
The identifiers used in SAS programs correspond to the following identifiers and Python types:
Table of Procedure Parameters and Types#
SAS Identifier |
Python Identifer |
Python Type |
Note |
---|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
MODIFIED: now relates to contents of |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
~~ |
~~ |
~~ |
DEPRECATED: use |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
MODIFIED: now relates to contents of |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
~~ |
~~ |
~~ |
DEPRECATED: specify |
~~ |
~~ |
~~ |
DEPRECATED: specify |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Example of Parameter Specification in Python#
The following code demonstrates how to specify different Python types associated with some common parameters.
foo = banff.donorimp(
min_donors=2,
percent_donors=0.1,
accept_negative=True,
edits="""x1>=-5;
x1<=15;
x2>=30;
x1+x2<=50;""",
by="province city",
unit_id='IDENT',
trace=True,
# ... etc. (tables)
)
Sample SAS Parameter Speficiation
proc donorimputation /* etc. (tables) ... */ mindonors=2 pcentdonors=0.1 acceptnegative edits="x1>=-5; x1<=15; x2>=30; x1+x2<=50;" ; by province city; id IDENT; run;
parameter |
note |
---|---|
|
A single variable name |
|
a list of 0 or more space-separated variable names |
|
excepts a number, see user guide for advice |
|
expects a number, see user guide for advice |
|
boolean: |
|
wrap multi-line strings with triple quotes ( |
New Procedure Options#
New options include
exclude_where_indata
Option#
This option excludes records based on a user-specified SQL expression. See the procedure-specific documentation for details
exclude_where_indata_hist
Option#
This option excludes records based on a user-specified SQL expression. See the procedure-specific documentation for details
prefill_by_vars
Option#
This option is available and enabled by default in all procedures which accept an instatus
table.
See User Guide
presort
Option#
This option is available and enabled by default in all procedures which accept input tables.
See User Guide
trace
Option#
This option is available in all procedures and controls console log verbosity.
See User Guide
Procedure Tables#
Many table parameter names have changed to better follow common Python naming conventions.
The identifiers used in SAS programs correspond to the following identifiers in Python:
Table of Procedure Table Identifiers#
SAS identifier |
Python identifer |
Note |
---|---|---|
|
|
|
|
|
no “AUX” option in Python, use |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
new: now accepted by |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
new: now produced by the |
|
|
|
|
|
|
Added in Banff 3.1.1 |
|
produced by |
Added in Banff 3.1.1 |
|
produced by |
Table Changes#
An overview of changes is provided in the following subsections, with links to detailed information.
For a complete list of detailed information, see Notable Procedure Changes.
Removal of by
Variables#
by
variables have been removed from many output tables. See BY Variables on Output Tables for more details.
Standardization of outstatus
Table#
also see Output Tables > outstatus
The outstatus
table is now standardized across all procedures which produce it. New output tables have been introduced to the Donor Imputation, Mass Imputation and Outlier procedures to accommodate this change.
Addition of instatus
Table#
The Errorloc procedure now accepts an instatus
table.
Table Specification#
For information on specifying input and output tables, supported formats, etc. please see the User Guide
Notable Procedure Changes#
Input Table Changes#
Addition of instatus
to Errorloc Procedure#
Errorloc now accepts an instatus
table.
The Errorloc procedure processes its instatus
table somewhat differently than other procedures. To favour selecting fields flagged for imputation, for each row in instatus
with a status flag of FTI
, the corresponding value in indata
will be treated as if it were missing.
Output Table Changes#
BY Variables on Output Tables#
In the SAS based procedures, by
variables were included on many output tables essentially by default whenever by-group processing was performed. In Banff 3.1.1 however, by
variables are only ever included on the following tables:
Estimator
outacceptable
outest_ef
outest_lr
outest_parm
outrand_err
Outlier
outsummary
Standardization of outstatus
Tables#
All outstatus
tables are now standardized and contain exactly the following columns
Column Name |
Note |
---|---|
|
column named after the |
|
name of the column to which the status applies |
|
the status code |
|
value of the variable to which the status applies* |
*
VALUE
column For procedures which produce anoutdata
table, the value is sourced from there.
Otherwise, the value is sourced from theindata
table.
Furthermore, non-status information formerly produced by the Donor Imputation and Outlier procedures has been removed. See below for more information.
Changes to Donor Imputation outstatus
#
Donor Imputation now produces a standardized outstatus
table. Data which has been removed from outstatus
is available in new optional output table outmatching_fields
.
This new table is disabled by default. Specify True
, or any valid output option to enable it. This new table replaces the match_field_stat
option, which is now deprecated.
Addition of outstatus
to Mass Imputation procedure#
Mass Imputation produces an outstatus
table with the flag IMAS
.
Changes to Outlier outstatus
#
Outlier now produces a standardized outstatus
table. Data which has been removed from outstatus
is available in new optional table outstatus_detailed
. This new table contains the variables <unit_id>
, FIELDID
, OUTLIER_STATUS
(formerly OUTSTATUS
, not to be confused with the oustatus
table’s status
variable), and any variables enabled by specifying outlier_stats=True
. The table is enabled by default. Specify False
to disable it.
Other Python Runtime Options#
Native Language Support#
Banff produces a log which can output messages in either English or French. See setting the log language from the user guide for details.
capture
option#
When running in Jupyter Notebooks, some log messages may be missing. Specifying capture=True
in a procedure call to may fix the issue. See suppressing and troubleshooting log messages from the user guide for details.
Performance Considerations#
Certain options and table formats can be expected to deliver optimal performance. See Performance Considerations for details.
Errors and Exceptions#
Error’s are handled differently in SAS vs in Python, where they are called exceptions. See Errors and Exceptions from the user guide for details.
Utility Functions#
Working with SAS Files in Python#
The banff package provides a few useful functions for reading SAS files in Python. See Working with SAS Files in Python in the user guide for details.