This function is used internally by tsraking()
to build the elements of the raking problem. It can also
be useful to derive the cross-sectional (marginal) totals of the raking problem manually (outside of the tsraking()
context).
Usage
build_raking_problem(
data_df,
metadata_df,
data_df_name = deparse1(substitute(data_df)),
metadata_df_name = deparse1(substitute(metadata_df)),
alterability_df = NULL,
alterSeries = 1,
alterTotal1 = 0,
alterTotal2 = 0
)
Arguments
- data_df
(mandatory)
Data frame (object of class "data.frame") that contains the time series data to be reconciled. It must minimally contain variables corresponding to the component series and cross-sectional control totals specified in the metadata data frame (argument
metadata_df
). If more than one observation (period) is provided, the sum of the provided component series values will also be preserved as part of implicit temporal constraints.- metadata_df
(mandatory)
Data frame (object of class "data.frame") that describes the cross-sectional aggregation constraints (additivity rules) for the raking problem. Two character variables must be included in the metadata data frame:
series
andtotal1
. Two variables are optional:total2
(character) andalterAnnual
(numeric). The values of variableseries
represent the variable names of the component series in the input time series data frame (argumentdata_df
). Similarly, the values of variablestotal1
andtotal2
represent the variable names of the 1st and 2nd dimension cross-sectional control totals in the input time series data frame. VariablealterAnnual
contains the alterability coefficient for the temporal constraint associated to each component series. When specified, the latter will override the default alterability coefficient specified with argumentalterAnnual
.- data_df_name
(optional)
String containing the value of argument
data_df
.Default value is
data_df_name = deparse1(substitute(data_df))
.- metadata_df_name
(optional)
String containing the value of argument
metadata_df
.Default value is
metadata_df_name = deparse1(substitute(metadata_df))
.- alterability_df
(optional)
Data frame (object of class "data.frame"), or
NULL
, that contains the alterability coefficients variables. They must correspond to a component series or a cross-sectional control total, that is, a variable with the same name must exist in the input time series data frame (argumentdata_df
). The values of these alterability coefficients will override the default alterability coefficients specified with argumentsalterSeries
,alterTotal1
andalterTotal2
. When the input time series data frame contains several observations and the alterability coefficients data frame contains only one, the alterability coefficients are used (repeated) for all observations of the input time series data frame. Alternatively, the alterability coefficients data frame may contain as many observations as the input time series data frame.Default value is
alterability_df = NULL
(default alterability coefficients).- alterSeries
(optional)
Nonnegative real number specifying the default alterability coefficient for the component series values. It will apply to component series for which alterability coefficients have not already been specified in the alterability coefficients data frame (argument
alterability_df
).Default value is
alterSeries = 1.0
(nonbinding component series values).- alterTotal1
(optional)
Nonnegative real number specifying the default alterability coefficient for the 1st dimension cross-sectional control totals. It will apply to cross-sectional control totals for which alterability coefficients have not already been specified in the alterability coefficients data frame (argument
alterability_df
).Default value is
alterTotal1 = 0.0
(binding 1st dimension cross-sectional control totals)- alterTotal2
(optional)
Nonnegative real number specifying the default alterability coefficient for the 2nd dimension cross-sectional control totals. It will apply to cross-sectional control totals for which alterability coefficients have not already been specified in the alterability coefficients data frame (argument
alterability_df
).Default value is
alterTotal2 = 0.0
(binding 2nd dimension cross-sectional control totals).
Value
A list with the elements of the raking problem (excluding the implicit temporal totals):
x
: vector of component series initial valuesc_x
: vector of component series alterability coefficientscomp_cols
: vector of component series (column) namesg
: vector of cross-sectional total initial valuesc_g
: vector of cross-sectional total alterability coefficientstot_cols
: vector of cross-sectional total (column) namesG
: cross-sectional total aggregation matrix (g = G %*% x
)
Details
See tsraking()
for a detailed description of time series raking problems.
The returned raking problem elements do not include the implicit component series temporal totals
when applicable (i.e., elements g
and G
only contain the cross-sectional totals info).
When the input data contains multiple periods (temporal total preservation scenario), raking problem
elements x
, c_x
, g
, c_g
and G
are constructed column by column (in "column-major order"),
corresponding to the default behaviour of R for converting objects of class "matrix" into vectors.
Note: argument validation is not performed here; it is (bluntly) assumed that the function is called
by tsraking()
where a thorough validation of the arguments is done.
Examples
# Derive the 5 marginal totals of a 2 x 3 two-dimensional data cube using `tsraking()`
# metadata.
my_metadata <- data.frame(series = c("A1", "A2", "A3",
"B1", "B2", "B3"),
total1 = c(rep("totA", 3),
rep("totB", 3)),
total2 = rep(c("tot1", "tot2", "tot3"), 2))
my_metadata
#> series total1 total2
#> 1 A1 totA tot1
#> 2 A2 totA tot2
#> 3 A3 totA tot3
#> 4 B1 totB tot1
#> 5 B2 totB tot2
#> 6 B3 totB tot3
# 6 periods of data with marginal totals set to `NA` (they MUST exist in the input data
# but can be `NA`).
my_data <- data.frame(A1 = c(12, 10, 12, 9, 15, 7),
B1 = c(20, 21, 15, 17, 19, 18),
A2 = c(14, 9, 8, 9, 11, 10),
B2 = c(20, 29, 20, 24, 21, 17),
A3 = c(13, 15, 17, 14, 16, 12),
B3 = c(24, 20, 30, 23, 21, 19),
tot1 = rep(NA, 6),
tot2 = rep(NA, 6),
tot3 = rep(NA, 6),
totA = rep(NA, 6),
totB = rep(NA, 6))
# Get the raking problem elements.
p <- build_raking_problem(my_data, my_metadata)
str(p)
#> List of 7
#> $ x : num [1:36] 12 10 12 9 15 7 14 9 8 9 ...
#> $ c_x : num [1:36] 1 1 1 1 1 1 1 1 1 1 ...
#> $ comp_cols: chr [1:6] "A1" "A2" "A3" "B1" ...
#> $ g : logi [1:30] NA NA NA NA NA NA ...
#> $ c_g : num [1:30] 0 0 0 0 0 0 0 0 0 0 ...
#> $ tot_cols : chr [1:5] "totA" "totB" "tot1" "tot2" ...
#> $ G : num [1:30, 1:36] 1 0 0 0 0 0 0 0 0 0 ...
# Calculate the 5 marginal totals for all 6 periods.
my_data[p$tot_cols] <- p$G %*% p$x
my_data
#> A1 B1 A2 B2 A3 B3 tot1 tot2 tot3 totA totB
#> 1 12 20 14 20 13 24 32 34 37 39 64
#> 2 10 21 9 29 15 20 31 38 35 34 70
#> 3 12 15 8 20 17 30 27 28 47 37 65
#> 4 9 17 9 24 14 23 26 33 37 32 64
#> 5 15 19 11 21 16 21 34 32 37 42 61
#> 6 7 18 10 17 12 19 25 27 31 29 54