Prorating#
Execution: banff.prorate()
SDE function types: Review, Selection, Treatment
Input status flags: I– (optional)
Output status flags: IPR
Description#
Prorates and rounds records to satisfy user-specified edits.
Unlike other Banff procedures, the edits
for this procedure follow specific criteria: only equalities are permitted, and the set of edits must form a hierarchical structure that sums to a grand-total. For example:
$$ subtotal1 + subtotal2 = grandtotal \ a + b + c = subtotal1 \ d + e + f = subtotal2 $$
Each individual edit must consist of a set of components xi that sum to a total y, i.e., of the form x1 + … xn = y. Inequalities and constants are not permitted. For each individual edit equation that is not satisfied, one of the two prorating algorithms (basic
or scaling
) is applied in order to rake the components to match the total. The procedure takes a top-down approach, beginning with the grand-total (which is never changed) and adjusting components as necessary, until the full set of edits is satisfied. Missing values are not prorated; they are set to zero during the procedure and reset to missing afterwards. Values of zero are never altered.
Additional features:
Automatic rounding to the desired number of decimal places.
Optional bounds to constrain the relative change of values during prorating.
Control over which variables are eligible for prorating.
Option to limit prorating to original or previously imputed values, either globally or for individual variables.
Weights to adjust the relative change of individual variables.
For a full mathematical description of the procedure methods, with examples, please see the Functional Description.
Input and output tables#
Descriptions of input and output tables are given below. Banff supports a number of input and output formats; please see the Banff User Guide for more information.
Input table |
Description |
---|---|
indata |
Input statistical data. Mandatory. |
instatus |
Input status file containing I– status flags. |
Output table |
Description |
---|---|
outdata |
Output statistical table containing imputed data. |
outstatus |
Output status file identifying imputed fields with IPR status flags, and their values after imputation. |
outreject |
Output table containing records that failed prorating. |
For details on the content of output tables, please see the Output Tables document.
Parameters#
Parameter |
Python type |
Description |
---|---|---|
unit_id |
str |
Identify key variable (unit identifier) on indata. Mandatory. |
edits |
str |
List of edits that the prorating procedure must satisfy. Mandatory. |
decimal |
int |
Number of decimals used in the rounding algorithm (between 0 and 9). Default=0. |
method |
str |
Prorating method (“SCALING” or “BASIC”). Default = “BASIC”. |
modifier |
str |
Global modifier (“ALWAYS”, “IMPUTED”, “ORIGINAL”) to control which values are prorated. Default = “ALWAYS” |
lower_bound |
float |
Lower bound on the relative change of the variables. Default = 0. |
upper_bound |
float |
Upper bound on the relative change of the variables. |
verify_edits |
bool |
Verify the consistency of the edits without performing any prorating. Default=False. |
accept_negative |
bool |
Treat negative values as valid. Default=False. |
by |
str |
Variable(s) used to partition indata into by-groups for independent processing. |
prefill_by_vars |
bool |
Add by-group variable(s) to input status file to improve performance. Default=True. |
presort |
bool |
Sort input tables before processing, according to procedure requirements. Default=True. |
no_by_stats |
bool |
Reduce log output by suppressing by-group specific messages. Default=False. |
Notes#
Edit syntax and restrictions#
Unlike other Banff procedures, the prorate procedure includes a unique syntax and restrictions on the edits
. Individual edits parameter must be written in the form
$$ w_1 x_1 : m_1 + … + w_n x_n : m_n = y $$
where xi and y are variables on indata, wi are weights and mi are modifiers. The xi variables are referred to as components and must appear on the left-hand-side of the equation. The y variable is referred to as the total and must appear on the right-hand-side of the equation. Individual edits are always separated by a semi-colon.
The set of edits
must have a hierarchical structure. That is, there must be one variable, the grand-total, that shows up as a total y in one equation, but not as a component xi in any others. The components that contribute to the grand-total may act as sub-totals for other equations, but only once; double-decompositions are not permitted. There is no limit to the number of levels in the hierarchy specified by the user. These restrictions can be summarized with the following rules:
Grand-total: Exactly one variable must appear as a total but never as a component. This variable is never altered by the procedure.
Sub-totals: All other variables that appear as a total in one equation must appear as a component in another equation.
In the complete set of
edits
, each variable can appear at most two times: once as a component xi and once as a total y.
Weights wi are optional. They must be a positive number; if not assigned to a variable, a default weight of one is applied. Weights can be used to control the relave amount of change in each component due to prorating, change being inversely proportional to the assigned weight. Weights are only assigned to the components, not the total.
Modifier codes mi are optional. Similar to weights, they can be assigned to individual variables, specified after the variable name and separated by a colon. Modifiers determine which variables are eligible for prorating. They behave in the same way as the global parameter modifier
. The permitted codes are:
“A” (Always): The variable can always be prorated. (Default approach.)
“N” (Never): The variable is never prorated.
“I” (Imputed): Only previously imputed values are prorated. Previously imputed values are identified by an imputation code I– on the instatus file, except for IDE which is treated as an original value.
“O” (Original): Only original values are prorated. This includes values with status codes of IDE, FTI, and FTE, as well as any values without a status code on instatus.
If modifier codes “A” or “I” are used, then instatus must be specified. Note that modifiers specified in the edits
overrule the global modifier specified by the modifier
parameter.
Example:
prorate_call = banff.prorate(
edits="""
sub1 + sub2 + sub3:N = grandtotal;
2a + b = sub1;
c:I + d:I + e:I + f:I = sub2;
2g:O + 3h:A = sub3;
""",
... # other parameters
)
In the above example:
Variable
sub3
is never prorated.Variable
a
has a weight of 2.Variables
c,d,e,f
are only prorated if the values were previously imputed.Variable
g
has a weight of 2 and only original values are prorated.Variable
h
has a weight of 3 and is always prorated, whether or the the value is original or imputed.
Top-down prorating procedure#
The prorate procedure begins with the grand-total. If the components of the grand-total equation do not sum to the grand-total, the prorating procedure is performed, adjusting the component values as necessary. (The grand-total never changes). This process then repeats for any of the components that are sub-totals, i.e., component variables that appear as totals in other edit equations. Note that in this process, each value is only prorated once, when it appears as a component.
Rounding#
The prorate procedure includes a rounding algorithm to adjust all fields to the correct number of decimal places while satisfying the edits
. Users can specify the desired number of decimal places using the decimal
parameter; the default value is zero (i.e., whole numbers). The value of decimals
must be an integer within [0,9], and must be equal to or greater than the actual number of decimals found on the total.
Note that the order of component variables in the edit equations can affect the results of the rounding algorithm, when the amount of required prorating cannot be evenly distributed amongst the components.
When there are very large summation components (i.e. larger than 9 digits with or without decimals), the procedure is not able to accurately prorate them to the total with a precision of 9 decimals. In such a case, the observation will be rejected and listed with a decimal error in the outreject table. To avoid this, one can reduce the value of the decimal
, for example by setting decimal=8
or lower.
Rejected records#
Sometimes prorating cannot be successfully performed on an individual, in which case the prorate procedure will output the record to the outreject table, alongside the reason for the failed prorating attempt. Reasons include:
The prorated values are outside the user-specified bounds.
The scaling value k is outside the interval [-1,1]. (Only for
method="scaling"
)The factor k cannot be calculated because the weighted sum of proratable columns is zero.
The record includes negative values and
accept_negative=True
isn’t specified.No values are eligible for prorating due to the “ORIGINAL” or “IMPUTED” modifier.
The user has specified fewer decimal places than exist in the adjusted total.