Scheme creation

`scheme` ¶

Scheme creation and extraction utilities.

This module provides functions to create and extract weighting schemes from various data formats. Schemes can be created from dictionaries, reference dataframes (microdata), or long-format aggregate tables. Both simple (flat) and segmented (nested) weighting schemes are supported.

`scheme_from_dict(distributions, name=None, rim_params=None)` ¶

Create a Rim scheme from a dictionary.

Supports two formats:

Simple Scheme (Flat):

{
    "age": {"18-24": 10, "25+": 90},
    "gender": {"M": 48, "F": 52}
}

Segmented Scheme (Nested):

{
    "segment_by": "region",
    "segment_targets": {"A": 30, "B": 70},
    "segments": {
    "A": { "age": {...}, "gender": {...} },
    "B": { ... }
    }
}

Parameters:

Name	Type	Description	Default
`distributions`	`SchemeDict`	Dictionary definition of the scheme	required
`name`	`str`	Name of the schema	`None`
`rim_params`	`dict`	Parameters for the Rim class	`None`

Returns:

Type	Description
`Rim`	A configured Rim object

`scheme_dict_from_df(df, cols_weighting, col_freq, col_filter=None)` ¶

Extract a weighting scheme dict from a reference microdata dataframe.

This is useful when you have a representative dataset (e.g., Census microdata or a high-quality random sample) where every row represents the combination of all demographic features, and you want to calculate targets dynamically based on its distributions.

Expected Input Format (Microdata):

Age	Gender	Region	Weight/Freq
18-24	Male	East	1.0
25-34	Female	East	1.0
65+	Male	West	2.5

Parameters:

Name	Type	Description	Default
`df`	`DataFrame`	The reference dataframe containing combinations of all demographic features	required
`cols_weighting`	`list of str`	List of columns to calculate targets for (e.g. ['Age', 'Gender'])	required
`col_freq`	`str`	Column containing the weight or frequency of each row. (For raw census data, this is often a column of 1s)	required
`col_filter`	`str`	Optional column for segmentation (e.g. 'Region'). If provided, targets are calculated within each unique value of this column	`None`

Returns:

Type	Description
`SchemeDict`	Dictionary containing the weighting scheme

`scheme_dict_from_long_df(df, col_variable, col_category, col_value, col_filter=None)` ¶

Extract a weighting scheme dict from a 'Long' or 'Tidy' aggregate dataframe.

This is useful for census data where you have a table of totals rather than individual rows.

Expected Input Format:

Variable	Category	Value	(Optional Filter/Region)
Age	18-24	500	East
Gender	Male	480	East

Parameters:

Name	Type	Description	Default
`df`	`DataFrame`	The dataframe containing aggregate targets	required
`col_variable`	`str`	Column name identifying the dimension (e.g. 'Age', 'Gender')	required
`col_category`	`str`	Column name identifying the group (e.g. '18-24', 'Male')	required
`col_value`	`str`	Column name identifying the target weight/count	required
`col_filter`	`str`	Optional column for segmentation (e.g. 'Region')	`None`

Returns:

Type	Description
`SchemeDict`	Dictionary containing the weighting scheme

`scheme_from_df(df, cols_weighting, col_freq, col_filter=None, name=None, rim_params=None)` ¶

Extract a weighting scheme from a reference microdata dataframe.

This is useful when you have a representative dataset (e.g., Census microdata or a high-quality random sample) where every row represents the combination of all demographic features, and you want to calculate targets dynamically based on its distributions.

Expected Input Format (Microdata):

Age	Gender	Region	Weight/Freq
18-24	Male	East	1.0
25-34	Female	East	1.0
65+	Male	West	2.5

Parameters:

Name	Type	Description	Default
`df`	`DataFrame`	The reference dataframe containing combinations of all demographic features	required
`cols_weighting`	`list of str`	List of columns to calculate targets for (e.g. ['Age', 'Gender'])	required
`col_freq`	`str`	Column containing the weight or frequency of each row. (For raw census data, this is often a column of 1s)	required
`col_filter`	`str`	Optional column for segmentation (e.g. 'Region'). If provided, targets are calculated within each unique value of this column	`None`
`name`	`str`	Name of the schema	`None`
`rim_params`	`dict`	Parameters for the Rim class	`None`

Returns:

Type	Description
`Rim`	A configured Rim object

`scheme_from_long_df(df, col_variable, col_category, col_value, col_filter=None, name=None, rim_params=None)` ¶

Extract a weighting scheme from a 'Long' or 'Tidy' aggregate dataframe.

This is useful for census data where you have a table of totals rather than individual rows.

Expected Input Format:

Variable	Category	Value	(Optional Filter/Region)
Age	18-24	500	East
Gender	Male	480	East

Parameters:

Name	Type	Description	Default
`df`	`DataFrame`	The dataframe containing aggregate targets	required
`col_variable`	`str`	Column name identifying the dimension (e.g. 'Age', 'Gender')	required
`col_category`	`str`	Column name identifying the group (e.g. '18-24', 'Male')	required
`col_value`	`str`	Column name identifying the target weight/count	required
`col_filter`	`str`	Optional column for segmentation (e.g. 'Region')	`None`
`name`	`str`	Name of the schema	`None`
`rim_params`	`dict`	Parameters for the Rim class	`None`

Returns:

Type	Description
`Rim`	A configured Rim object

Scheme creation

scheme ¶

scheme_from_dict(distributions, name=None, rim_params=None) ¶

scheme_dict_from_df(df, cols_weighting, col_freq, col_filter=None) ¶

scheme_dict_from_long_df(df, col_variable, col_category, col_value, col_filter=None) ¶

scheme_from_df(df, cols_weighting, col_freq, col_filter=None, name=None, rim_params=None) ¶

scheme_from_long_df(df, col_variable, col_category, col_value, col_filter=None, name=None, rim_params=None) ¶

`scheme` ¶

`scheme_from_dict(distributions, name=None, rim_params=None)` ¶

`scheme_dict_from_df(df, cols_weighting, col_freq, col_filter=None)` ¶

`scheme_dict_from_long_df(df, col_variable, col_category, col_value, col_filter=None)` ¶

`scheme_from_df(df, cols_weighting, col_freq, col_filter=None, name=None, rim_params=None)` ¶

`scheme_from_long_df(df, col_variable, col_category, col_value, col_filter=None, name=None, rim_params=None)` ¶