Table of Contents#
Introduction#
The Banff procedures generate a number of output tables, which can be categorized as follows:
Standard output tables:
outdata
, containing the results of imputation (for procedures performing treatment)outstatus
, containing status flags (for procedures performing selection or treatment)
Diagnostic files, containing useful information specific to the procedure
Note that in many tables, <unit_id> appears as a column header. This is a placeholder; the actual column header is the name of the variable by the unit_id
parameter.
List of output tables#
Procedure |
Output Tables |
---|---|
verifyedits |
none - outputs printed to log |
editstats |
|
outlier |
|
errorloc |
|
deterministic |
|
donorimp |
|
estimator |
|
prorate |
|
massimp |
|
Output tables common to multiple procedures#
Procedure |
outdata |
outstatus |
outdonormap |
outreject |
---|---|---|---|---|
outlier |
X |
|||
errorloc |
X |
X |
||
deterministic |
X |
X |
||
donorimp |
X |
X |
X |
|
estimator |
X |
X |
||
prorate |
X |
X |
X |
|
massimp |
X |
X |
X |
Standard output tables#
The Banff procedures are designed to be run sequentially as part of an SDE process flow. The outputs from one procedure often act as inputs for subsequent procedures, and the statistical data that is the target of the SDE process is updated continuously throughout the process. Two standard outputs that are common to multiple procedures are outdata
and outstatus
. Please see the Interaction between procedures section of the user guide for more details.
outdata#
Procedures performing treatment functions (deterministic
, donorimp
, estimator
, prorate
and massimp
) produce the outdata
table, output statistical data (i.e., microdata) that includes the result of the treatment function. This includes both imputed values (e.g., imputed from donorimp
) and modified values (e.g., prorated values from prorate
). Some important notes about outdata
:
The
outdata
table is typically not a complete copy ofindata
but only contains rows and columns affected by the procedure. For example, ifindata
includes 2000 rows and 25 columns, but only 500 rows and 10 columns are affected by a procedure, thenoutdata
will only include 500 rows and 10 columns. To continue the SDE process, users should manually update theindata
file with the new information fromoutdata
. (Note: The Banff team is looking into automatically updatingindata
with the results ofoutdata
in a future release.)The
outdata
table always contains variables specified by theunit_id
andby
parameters.If no records are successfully imputed or altered by the procedure, then
outdata
is empty. No error will occur.
outstatus#
Selection and imputation flags are always associated with individual values on indata
. Because indata
is tabular, each observation can be associated with a specific record (row) and field (column). Records are identified by the user-specified unique record identifier unit_id
, while fields are referenced by their variable name. Status flags are stored in the outstatus
file with the following columns:
Column |
Description |
---|---|
Record identifier (i.e., row) to which the status flag applies. (The actual column header is the name of the variable by the |
|
|
Field identifier (i.e., column) to which the status flag applies. |
|
Status flag such as “FTI”, “FTE”, or “IDN”. |
|
Value of the variable when the status code was generated. For procedures performing selection ( |
All procedures performing selection or treatment functions (i.e., all but verifyedits
and editstats
) automatically produce outstatus
containing selection or imputation flags. Some procedures also read status files as inputs (instatus
); these may be required, depending on the procedure.
editstats#
outedits_reduced#
Contains the minimal set of edits. The EDITID
value is created by the procedure to cross-reference the other procedure outputs.
Column |
Description |
---|---|
EDITID |
Identification number of the edit belonging to the minimal set. |
EDIT_EQUATION |
The formulation of the edit. |
outedit_status#
Number of records which pass, miss, or fail each edit.
Column |
Description |
---|---|
EDITID |
Identification number of the edit belonging to the minimal set. |
OBS_PASSED |
Number of observations that passed the edit. |
OBS_MISSED |
Number of observations that have one or more missing variables involved in the edit. |
OBS_FAILED |
Number of observations that failed the edit because of one or more non-missing values. |
outk_edits_status#
Number of records which pass, miss, or fail a given number of edits.
Column |
Description |
---|---|
K_EDITS |
« k » cumulated edits that belong to the minimal set. |
OBS_PASSED |
Number of observations that passed « k » edits. |
OBS_MISSED |
Number of observations that have one or more missing variables involved in « k » edits. |
OBS_FAILED |
Number of observations that failed « k » edits because of one or more non-missing values. |
outglobal_status#
Number of records which pass, miss, or fail the overall record status.
Column |
Description |
---|---|
OBS_PASSED |
Number of observations with |
OBS_MISSED |
Number of observations with |
OBS_FAILED |
Number of observations with |
OBS_TOTAL |
Total number of observations. |
outedit_applic#
Number of times each variable was involved in an edit which passed, missed or failed.
Column |
Description |
---|---|
FIELDID |
Name of the variable. |
EDIT_APPLIC_PASSED |
Number of times the variable has « inherited » a |
EDIT_APPLIC_MISSED |
Number of times the variable has « inherited » a |
EDIT_APPLIC_FAILED |
Number of times the variable has « inherited » a |
EDIT_APPLIC_NOTINVOLVED |
Number of edits not involving the variable multiplied by the number of observations. |
EDITS_INVOLVED |
Number of edits involving the variable. |
outvars_role#
Number of times each variable contributed to the overall record status.
Column |
Description |
---|---|
FIELDID |
Name of the variable. |
OBS_PASSED |
Number of times a variable has an incidence on the |
OBS_MISSED |
Number of times a variable has an incidence on the |
OBS_FAILED |
Number of times a variable has an incidence on the |
OBS_NOT_APPLICABLE |
Number of times a variable has no incidence on the |
outlier#
outstatus_detailed#
Contains the outlier_status
of each selected outlier. Additional information is displayed if outlier_stats=True
is specified when the procedure is called.
Column |
Description |
---|---|
<unit_id> |
Record identifier. |
FIELDID |
Field identifier. |
OUTLIER_STATUS |
Detailed status of the outliers that has four possible values: |
METHOD |
Outlier detection method. |
CURRENT_VALUE |
Value of the outlier from |
WEIGHT |
Weight applied to |
HIST_AUX |
Name of the variable paired with the corresponding |
HIST_AUX_VALUE |
Value of the variable paired with the corresponding |
EFFECT |
Effect value calculated for the record and compared with the interval boundaries as part of outlier detection steps. |
GAP |
The gap between this record and the previous one with a different value (SG method). |
IMP_SIGMAGAP |
Imputation sigma-gap calculated as |
EXCL_SIGMAGAP |
Exclusion sigma-gap calculated as |
IMP_BND_L |
HB Method: Upper bound of left imputation interval. |
EXCL_BND_L |
HB Method: Upper bound of left exclusion interval. |
EXCL_BND_R |
HB Method: Lower bound of right exclusion interval. |
IMP_BND_R |
HB Method: Lower bound of right imputation interval. |
outsummary#
Summary information about outlier detection, including observation counts, status counts, and statistical outputs (such as acceptance interval bounds) calculated by the procedure.
Column |
Description |
---|---|
NObs |
Number of observations from |
NValid |
Number of valid observations. |
NRejected_NoMatch |
Number of observations rejected because an observation matching on |
FIELDID |
Field identifier. If outlier detection is applied to a ratio or historical trend, |
NUsed |
Number of observations used for outlier detection. |
NRejected |
Total number of observations rejected, for the current by-group and |
NRejected_VarMissing |
Number of observations rejected because the variable of interest is missing. |
NRejected_VarZero |
Number of observations rejected because the variable of interest is zero (only applicable if |
NRejected_VarNegative |
Number of observations rejected because the variable of interest is negative (only applicable if |
NFTI |
Number of observations flagged for imputation (i.e. with an |
NFTE |
Number of observations flagged for exclusion (i.e. with an |
IMP_BND_L |
HB Method: Upper bound of left imputation interval. |
EXCL_BND_L |
HB Method: Upper bound of left exclusion interval. |
EXCL_BND_R |
HB Method: Lower bound of right imputation interval. |
IMP_BND_R |
HB Method: Lower bound of right imputation interval. |
AuxVarID |
Auxiliary variable used as the denominator when outlier detection is used on a ratio or historical trend. |
NRejected_AuxVarMissing |
Number of observations rejected because auxiliary variable is missing. |
NRejected_AuxVarZero |
Number of observations rejected because auxiliary variable is zero. |
NRejected_AuxVarNegative |
Number of observations rejected because auxiliary variable is negative. |
Q1 |
First quartile (HB Method). |
M |
Median (HB Method). |
Q3 |
Third quartile (HB Method). |
DEVIATION |
Median absolute deviation if |
IMP_SIGMAGAP |
Imputation sigma-gap calculated as beta_i multiplied by the deviation (SG method). |
EXCL_SIGMAGAP |
Exclusion sigma-gap calculated as beta_e multiplied by the deviation (SG method). |
errorloc#
outreject#
Records for which the error localization problem could not be solved.
Column |
Description |
---|---|
<unit_id> |
Record identifier. |
NAME_ERROR |
Can take on one of two possible values: |
donorimp#
outmatching_fields#
Statuses of matching fields used for distance calculations. These can be different for each recipient.
Column |
Description |
---|---|
<unit_id> |
Recipient identifier. |
FIELDID |
Field identifier. |
STATUS |
Status code that can take on one of four possible values: |
outdonormap#
Mapping of recipient/donor pairs.
Column |
Description |
---|---|
RECIPIENT |
Recipient identifier. |
DONOR |
Donor identifier. |
NUMBER_OF_ATTEMPTS |
Number of donors tried before the recipient passed the postedits. |
DONORLIMIT |
Reproduces the value of the |
estimator#
outacceptable#
List of acceptable observations used in the calculation of model parameters for imputation.
Column |
Description |
---|---|
ESTIMID |
Identification number for the estimator defined based on the inestimator input table order (begins with number 0). |
ALGORITHMNAME |
Name of the algorithm (pre-defined in procedure or custom defined). |
<unit_id> |
Record identifier. |
outest_ef#
Report on calculation of averages (estimator functions).
Column |
Description |
---|---|
ESTIMID |
Identification number for the estimator defined based on the estimator input table order (begins with number 0). |
ALGORITHMNAME |
Name of the algorithm (pre-defined in procedure or custom defined). |
FIELDID |
Field identifier for which an average is calculated. |
PERIOD |
Current (C) or historical (H) period specific for the FIELDID variable. |
AVERAGE_VALUE |
Average value for the FIELDID variable. |
COUNT |
Number of acceptable observations used to calculate the average. The number of acceptable observations is the same in the calculation of all averages present in the formula of one estimator. |
outest_lr#
Report on calculation of « beta » coefficients (linear regression estimators).
Column |
Description |
---|---|
ESTIMID |
Identification number of the linear regression estimator based on the estimator input table order (begins with number 0). |
ALGORITHMNAME |
Name of the algorithm (pre-defined in procedure or custom defined). |
FIELDID |
name of the variable or regressor for which a beta coefficient is calculated. |
EXPONENT |
Regressor exponent. |
PERIOD |
Current (C) or historical (H) period specific to the regressor. |
BETA_VALUE |
Value of « beta » coefficient associated with the regressor. |
COUNT |
Number of acceptable observations used to calculate the « beta » coefficients The number of acceptable observations is the same for all « beta » coefficients present in the formula of one estimator identification. |
outest_parm#
Imputation statistics.
Column |
Description |
---|---|
ESTIMID |
Identification number for the estimator defined based on the estimator input table order (begins with number 0). |
ALGORITHMNAME |
Name of the algorithm (pre-defined in procedure or custom defined). |
FIELDID |
Field identifier. |
FTI |
Count of FTI flags on |
IMP |
Count of I– flags on |
DIVISIONBYZERO |
Number of failed imputations because calculation implies a division by 0. |
NEGATIVE |
Number of imputations discarded because imputed value is negative. (Not applicable if |
outrand_err#
Random error report.
Column |
Description |
---|---|
ESTIMID |
Identification number for the estimator defined based on the estimator input table order (begins with number 0). |
ALGORITHMNAME |
Name of the algorithm (pre-defined in procedure or custom defined). |
RECIPIENT |
Recipient identifier. |
DONOR |
Donor identifier. |
FIELDID |
Field identifier. |
RESIDUAL |
Calculated as difference between observed and estimated value, taken from donor record. |
RANDOMERROR |
If algorithm type is LR and variance is used then it is equal to: RESIDUAL* sqrt ((recipient variance ^ exponent) / (donor variance ^ exponent)), otherwise it is the same as RESIDUAL. |
ORIGINALVALUE |
Reported value of the variable before imputation. |
IMPUTEDVALUE |
Value of variable after imputation. |
prorate#
outreject#
Records for which prorating could not be performed.
Column |
Description |
---|---|
<unit_id> |
Record identifier. |
NAME_ERROR |
Can take on one of seven possible values: |
massimp#
outdonormap#
Mapping of recipient/donor pairs.
Column |
Description |
---|---|
RECIPIENT |
Recipient identifier. |
DONOR |
Donor identifier. |
NUMBER_OF_ATTEMPTS |
Can take on one of two possible values: (1) if the donor was selected by the nearest neighbour method or (0) if it was selected randomly. |
DONORLIMIT |
Reproduces the value of |