Replication of the G-Series 2.0 SAS\(^\circledR\) TSRAKING procedure (PROC TSRAKING). See the G-Series 2.0 documentation for details (Statistics Canada 2016).
This function will restore cross-sectional aggregation constraints in a system of time series. The aggregation constraints may come from a 1 or 2-dimensional table. Optionally, temporal constraints can also be preserved.
tsraking()
is usually called in practice through tsraking_driver()
in order to reconcile
all periods of the time series system in a single function call.
Usage
tsraking(
data_df,
metadata_df,
alterability_df = NULL,
alterSeries = 1,
alterTotal1 = 0,
alterTotal2 = 0,
alterAnnual = 0,
tolV = 0.001,
tolP = NA,
warnNegResult = TRUE,
tolN = -0.001,
id = NULL,
verbose = FALSE,
# New in G-Series 3.0
Vmat_option = 1,
warnNegInput = TRUE,
quiet = FALSE
)
Arguments
- data_df
(mandatory)
Data frame (object of class "data.frame") that contains the time series data to be reconciled. It must minimally contain variables corresponding to the component series and cross-sectional control totals specified in the metadata data frame (argument
metadata_df
). If more than one observation (period) is provided, the sum of the provided component series values will also be preserved as part of implicit temporal constraints.- metadata_df
(mandatory)
Data frame (object of class "data.frame") that describes the cross-sectional aggregation constraints (additivity rules) for the raking problem. Two character variables must be included in the metadata data frame:
series
andtotal1
. Two variables are optional:total2
(character) andalterAnnual
(numeric). The values of variableseries
represent the variable names of the component series in the input time series data frame (argumentdata_df
). Similarly, the values of variablestotal1
andtotal2
represent the variable names of the 1st and 2nd dimension cross-sectional control totals in the input time series data frame. VariablealterAnnual
contains the alterability coefficient for the temporal constraint associated to each component series. When specified, the latter will override the default alterability coefficient specified with argumentalterAnnual
.- alterability_df
(optional)
Data frame (object of class "data.frame"), or
NULL
, that contains the alterability coefficients variables. They must correspond to a component series or a cross-sectional control total, that is, a variable with the same name must exist in the input time series data frame (argumentdata_df
). The values of these alterability coefficients will override the default alterability coefficients specified with argumentsalterSeries
,alterTotal1
andalterTotal2
. When the input time series data frame contains several observations and the alterability coefficients data frame contains only one, the alterability coefficients are used (repeated) for all observations of the input time series data frame. Alternatively, the alterability coefficients data frame may contain as many observations as the input time series data frame.Default value is
alterability_df = NULL
(default alterability coefficients).- alterSeries
(optional)
Nonnegative real number specifying the default alterability coefficient for the component series values. It will apply to component series for which alterability coefficients have not already been specified in the alterability coefficients data frame (argument
alterability_df
).Default value is
alterSeries = 1.0
(nonbinding component series values).- alterTotal1
(optional)
Nonnegative real number specifying the default alterability coefficient for the 1st dimension cross-sectional control totals. It will apply to cross-sectional control totals for which alterability coefficients have not already been specified in the alterability coefficients data frame (argument
alterability_df
).Default value is
alterTotal1 = 0.0
(binding 1st dimension cross-sectional control totals)- alterTotal2
(optional)
Nonnegative real number specifying the default alterability coefficient for the 2nd dimension cross-sectional control totals. It will apply to cross-sectional control totals for which alterability coefficients have not already been specified in the alterability coefficients data frame (argument
alterability_df
).Default value is
alterTotal2 = 0.0
(binding 2nd dimension cross-sectional control totals).- alterAnnual
(optional)
Nonnegative real number specifying the default alterability coefficient for the component series temporal constraints (e.g., annual totals). It will apply to component series for which alterability coefficients have not already been specified in the metadata data frame (argument
metadata_df
).Default value is
alterAnnual = 0.0
(binding temporal control totals).- tolV, tolP
(optional)
Nonnegative real number, or
NA
, specifying the tolerance, in absolute value or percentage, to be used when performing the ultimate test in the case of binding totals (alterability coefficient of \(0.0\) for temporal or cross-sectional control totals). The test compares the input binding control totals with the ones calculated from the reconciled (output) component series. ArgumentstolV
andtolP
cannot be both specified together (one must be specified while the other must beNA
).Example: to set a tolerance of 10 units, specify
tolV = 10, tolP = NA
; to set a tolerance of 1%, specifytolV = NA, tolP = 0.01
.Default values are
tolV = 0.001
andtolP = NA
.- warnNegResult
(optional)
Logical argument specifying whether a warning message is generated when a negative value created by the function in the reconciled (output) series is smaller than the threshold specified by argument
tolN
.Default value is
warnNegResult = TRUE
.- tolN
(optional)
Negative real number specifying the threshold for the identification of negative values. A value is considered negative when it is smaller than this threshold.
Default value is
tolN = -0.001
.- id
(optional)
String vector (minimum length of 1), or
NULL
, specifying the name of additional variables to be transferred from the input time series data frame (argumentdata_df
) to the output time series data frame, the object returned by the function (see section Value). By default, the output series data frame only contains the variables listed in the metadata data frame (argumentmetadata_df
).Default value is
id = NULL
.- verbose
(optional)
Logical argument specifying whether information on intermediate steps with execution time (real time, not CPU time) should be displayed. Note that specifying argument
quiet = TRUE
would nullify argumentverbose
.Default value is
verbose = FALSE
.- Vmat_option
(optional)
Specification of the option for the variance matrices (\(V_e\) and \(V_\epsilon\); see section Details):
Value Description 1
Use vectors \(x\) and \(g\) in the variance matrices. 2
Use vectors \(|x|\) and \(|g|\) in the variance matrices. See Ferland (2016) and subsection Arguments
Vmat_option
andwarnNegInput
in section Details for more information.Default value is
Vmat_option = 1
.- warnNegInput
(optional)
Logical argument specifying whether a warning message is generated when a negative value smaller than the threshold specified by argument
tolN
is found in the input time series data frame (argumentdata_df
).Default value is
warnNegInput = TRUE
.- quiet
(optional)
Logical argument specifying whether or not to display only essential information such as warnings and errors. Specifying
quiet = TRUE
would also nullify argumentverbose
and is equivalent to wrapping yourtsraking()
call withsuppressMessages()
.Default value is
quiet = FALSE
.
Value
The function returns a data frame containing the reconciled component series, reconciled cross-sectional control
totals and variables specified with argument id
. Note that the "data.frame" object can be explicitly coerced to
another type of object with the appropriate as*()
function (e.g., tibble::as_tibble()
would coerce it to a tibble).
Details
This function returns the generalized least squared solution of a specific, simple variant of the general regression-based raking model proposed by Dagum and Cholette (Dagum and Cholette 2006). The model, in matrix form, is: $$\displaystyle \begin{bmatrix} x \\ g \end{bmatrix} = \begin{bmatrix} I \\ G \end{bmatrix} \theta + \begin{bmatrix} e \\ \varepsilon \end{bmatrix} $$ where
\(x\) is the vector of the initial component series values.
\(\theta\) is the vector of the final (reconciled) component series values.
\(e \sim \left( 0, V_e \right)\) is the vector of the measurement errors of \(x\) with covariance matrix \(V_e = \mathrm{diag} \left( c_x x \right)\), or \(V_e = \mathrm{diag} \left( \left| c_x x \right| \right)\) when argument
Vmat_option = 2
, where \(c_x\) is the vector of the alterability coefficients of \(x\).\(g\) is the vector of the initial control totals, including the component series temporal totals (when applicable).
\(\varepsilon \sim (0, V_\varepsilon)\) is the vector of the measurement errors of \(g\) with covariance matrix \(V_\varepsilon = \mathrm{diag} \left( c_g g \right)\), or \(V_\varepsilon = \mathrm{diag} \left( \left| c_g g \right| \right)\) when argument
Vmat_option = 2
, where \(c_g\) is the vector of the alterability coefficients of \(g\).\(G\) is the matrix of aggregation constraints, including the implicit temporal constraints (when applicable).
The generalized least squared solution is: $$\displaystyle \hat{\theta} = x + V_e G^{\mathrm{T}} \left( G V_e G^{\mathrm{T}} + V_\varepsilon \right)^+ \left( g - G x \right) $$ where \(A^{+}\) designates the Moore-Penrose inverse of matrix \(A\).
tsraking()
solves a single raking problem, i.e., either a single period of the time series system, or a single
temporal group (e.g., all periods of a given year) when temporal total preservation is required. Several call to
tsraking()
are therefore necessary in order to reconcile all the periods of the time series system.
tsraking_driver()
can achieve this in a single call: it conveniently determines the required set of raking
problems to be solved and internally generates the individual calls to tsraking()
.
Alterability Coefficients
Alterability coefficients \(c_x\) and \(c_g\) conceptually represent the measurement errors associated with the input component series values \(x\) and control totals \(g\) respectively. They are nonnegative real numbers which, in practice, specify the extent to which an initial value can be modified in relation to other values. Alterability coefficients of \(0.0\) define fixed (binding) values while alterability coefficients greater than \(0.0\) define free (nonbinding) values. Increasing the alterability coefficient of an intial value results in more changes for that value in the reconciled (output) data and, conversely, less changes when decreasing the alterability coefficient. The default alterability coefficients are \(1.0\) for the component series values and \(0.0\) for the cross-sectional control totals and, when applicable, the component series temporal totals. These default alterability coefficients result in a proportional allocation of the discrepancies to the component series. Setting the component series alterability coefficients to the inverse of the component series initial values would result in a uniform allocation of the discrepancies instead. Almost binding totals can be obtained in practice by specifying very small (almost \(0.0\)) alterability coefficients relative to those of the (nonbinding) component series.
Temporal total preservation refers to the fact that temporal totals, when applicable, are usually kept “as close as possible” to their initial value. Pure preservation is achieved by default with binding temporal totals while the change is minimized with nonbinding temporal totals (in accordance with the set of alterability coefficients).
Arguments Vmat_option
and warnNegInput
These arguments allow for an alternative handling of negative values in the input data, similar to that of tsbalancing()
.
Their default values correspond to the G-Series 2.0 behaviour (SAS\(^\circledR\) PROC TSRAKING) for which equivalent
options are not defined. The latter was developed with "nonnegative input data only" in mind, similar to SAS\(^\circledR\)
PROC BENCHMARKING in G-Series 2.0 that did not allow negative values either with proportional benchmarking, which explains
the "suspicious use of proportional raking" warning in presence of negative values with PROC TSRAKING in G-Series 2.0 and
when warnNegInput = TRUE
(default). However, (proportional) raking in the presence of negative values generally works well
with Vmat_option = 2
and produces reasonable, intuitive solutions. E.g., while the default Vmat_option = 1
fails at
solving constraint A + B = C
with input data A = 2
, B = -2
, C = 1
and the default alterability coefficients,
Vmat_option = 2
returns the (intuitive) solution A = 2.5
, B = -1.5
, C = 1
(25% increase for A
and B
). See
Ferland (2016) for more details.
Treatment of Missing (NA
) Values
Missing values in the input time series data frame (argument data_df
) or alterability coefficients data frame
(argument alterability_df
) for any of the raking problem data (variables listed in the metadata data frame
with argument metadata_df
) will generate an error message and stop the function execution.
Comparing tsraking()
and tsbalancing()
tsraking()
is limited to one- and two-dimensional aggregation table raking problems (with temporal total preservation if required) whiletsbalancing()
handles more general balancing problems (e.g., higher dimensional raking problems, nonnegative solutions, general linear equality and inequality constraints as opposed to aggregation rules only, etc.).tsraking()
returns the generalized least squared solution of the Dagum and Cholette regression-based raking model (Dagum and Cholette 2006) whiletsbalancing()
solves the corresponding quadratic minimization problem using a numerical solver. In most cases, convergence to the minimum is achieved and thetsbalancing()
solution matches the (exact)tsraking()
least square solution. It may not be the case, however, if convergence could not be achieved after a reasonable number of iterations. Having said that, only in very rare occasions will thetsbalancing()
solution significantly differ from thetsraking()
solution.tsbalancing()
is usually faster thantsraking()
, especially for large raking problems, but is generally more sensitive to the presence of (small) inconsistencies in the input data associated to the redundant constraints of fully specified (over-specified) raking problems.tsraking()
handles these inconsistencies by using the Moore-Penrose inverse (uniform distribution among all binding totals).tsbalancing()
accommodates the specification of sparse problems in their reduced form. This is not true in the case oftsraking()
where aggregation rules must always be fully specified since a complete data cube without missing data is expected as input (every single inner-cube component series must contribute to all dimensions of the cube, i.e., to every single outer-cube marginal total series).Both tools handle negative values in the input data differently by default. While the solutions of raking problems obtained from
tsbalancing()
andtsraking()
are identical when all input data points are positive, they will differ if some data points are negative (unless argumentVmat_option = 2
is specified withtsraking()
).While both
tsbalancing()
andtsraking()
allow the preservation of temporal totals, time management is not incorporated intsraking()
. For example, the construction of the processing groups (sets of periods of each raking problem) is left to the user withtsraking()
and separate calls must be submitted for each processing group (each raking problem). That's where helper functiontsraking_driver()
comes in handy withtsraking()
.tsbalancing()
returns the same set of series as the input time series object whiletsraking()
returns the set of series involved in the raking problem plus those specified with argumentid
(which could correspond to a subset of the input series).
References
Bérubé, J. and S. Fortier (2009). "PROC TSRAKING: An in-house SAS\(^\circledR\) procedure for balancing time series". In JSM Proceedings, Business and Economic Statistics Section. Alexandria, VA: American Statistical Association.
Dagum, E. B. and P. Cholette (2006). Benchmarking, Temporal Distribution and Reconciliation Methods of Time Series. Springer-Verlag, New York, Lecture Notes in Statistics, Vol. 186.
Ferland, M. (2016). "Negative Values with PROC TSRAKING". Internal document. Statistics Canada, Ottawa, Canada.
Fortier, S. and B. Quenneville (2009). "Reconciliation and Balancing of Accounts and Time Series". In JSM Proceedings, Business and Economic Statistics Section. Alexandria, VA: American Statistical Association.
Quenneville, B. and S. Fortier (2012). "Restoring Accounting Constraints in Time Series – Methods and Software for a Statistical Agency". Economic Time Series: Modeling and Seasonality. Chapman & Hall, New York.
Statistics Canada (2016). "The TSRAKING Procedure". G-Series 2.0 User Guide. Statistics Canada, Ottawa, Canada.
Statistics Canada (2018). Theory and Application of Reconciliation (Course code 0437). Statistics Canada, Ottawa, Canada.
Examples
###########
# Example 1: Simple 1-dimensional raking problem where the values of `cars` and `vans`
# must sum up to the value of `total`.
# Problem metadata
my_metadata1 <- data.frame(series = c("cars", "vans"),
total1 = c("total", "total"))
my_metadata1
#> series total1
#> 1 cars total
#> 2 vans total
# Problem data
my_series1 <- data.frame(cars = 25, vans = 5, total = 40)
# Reconcile the data
out_raked1 <- tsraking(my_series1, my_metadata1)
#>
#>
#> --- Package gseries 3.0.2 - Improve the Coherence of Your Time Series Data ---
#> Created on June 16, 2025, at 3:11:30 PM EDT
#> URL: https://StatCan.github.io/gensol-gseries/en/
#> https://StatCan.github.io/gensol-gseries/fr/
#> Email: g-series@statcan.gc.ca
#>
#> tsraking() function:
#> data_df = my_series1
#> metadata_df = my_metadata1
#> alterability_df = NULL (default)
#> alterSeries = 1 (default)
#> alterTotal1 = 0 (default)
#> alterTotal2 = 0 (default)
#> alterAnnual = 0 (default)
#> tolV = 0.001 (default)
#> warnNegResult = TRUE (default)
#> tolN = -0.001 (default)
#> id = NULL (default)
#> verbose = FALSE (default)
#> (*)Vmat_option = 1 (default)
#> (*)warnNegInput = TRUE (default)
#> (*)quiet = FALSE (default)
#> (*) indicates new arguments in G-Series 3.0
#>
# Initial data
my_series1
#> cars vans total
#> 1 25 5 40
# Reconciled data
out_raked1
#> cars vans total
#> 1 33.33333 6.666667 40
# Check the output cross-sectional constraint
all.equal(rowSums(out_raked1[c("cars", "vans")]), out_raked1$total)
#> [1] TRUE
# Check the control total (fixed)
all.equal(my_series1$total, out_raked1$total)
#> [1] TRUE
###########
# Example 2: 2-dimensional raking problem similar to the 1st example but adding
# regional sales for the 3 prairie provinces (Alb., Sask. and Man.)
# and where the sales of vans in Sask. are non-alterable
# (alterability coefficient = 0), with `quiet = TRUE` to avoid
# displaying the function header.
# Problem metadata
my_metadata2 <- data.frame(series = c("cars_alb", "cars_sask", "cars_man",
"vans_alb", "vans_sask", "vans_man"),
total1 = c(rep("cars_total", 3),
rep("vans_total", 3)),
total2 = rep(c("alb_total", "sask_total", "man_total"), 2))
my_metadata2
#> series total1 total2
#> 1 cars_alb cars_total alb_total
#> 2 cars_sask cars_total sask_total
#> 3 cars_man cars_total man_total
#> 4 vans_alb vans_total alb_total
#> 5 vans_sask vans_total sask_total
#> 6 vans_man vans_total man_total
# Problem data
my_series2 <- data.frame(cars_alb = 12, cars_sask = 14, cars_man = 13,
vans_alb = 20, vans_sask = 20, vans_man = 24,
alb_total = 30, sask_total = 31, man_total = 32,
cars_total = 40, vans_total = 53)
# Reconciled data
out_raked2 <- tsraking(my_series2, my_metadata2,
alterability_df = data.frame(vans_sask = 0),
quiet = TRUE)
# Initial data
my_series2
#> cars_alb cars_sask cars_man vans_alb vans_sask vans_man alb_total sask_total
#> 1 12 14 13 20 20 24 30 31
#> man_total cars_total vans_total
#> 1 32 40 53
# Reconciled data
out_raked2
#> cars_alb cars_sask cars_man vans_alb vans_sask vans_man alb_total sask_total
#> 1 14.31298 11 14.68702 15.68702 20 17.31298 30 31
#> man_total cars_total vans_total
#> 1 32 40 53
# Check the output cross-sectional constraints
all.equal(rowSums(out_raked2[c("cars_alb", "cars_sask", "cars_man")]), out_raked2$cars_total)
#> [1] TRUE
all.equal(rowSums(out_raked2[c("vans_alb", "vans_sask", "vans_man")]), out_raked2$vans_total)
#> [1] TRUE
all.equal(rowSums(out_raked2[c("cars_alb", "vans_alb")]), out_raked2$alb_total)
#> [1] TRUE
all.equal(rowSums(out_raked2[c("cars_sask", "vans_sask")]), out_raked2$sask_total)
#> [1] TRUE
all.equal(rowSums(out_raked2[c("cars_man", "vans_man")]), out_raked2$man_total)
#> [1] TRUE
# Check the control totals (fixed)
tot_cols <- union(unique(my_metadata2$total1), unique(my_metadata2$total2))
all.equal(my_series2[tot_cols], out_raked2[tot_cols])
#> [1] TRUE
# Check the value of vans in Saskatchewan (fixed at 20)
all.equal(my_series2$vans_sask, out_raked2$vans_sask)
#> [1] TRUE