Skip to content

Scheme creation

scheme

Scheme creation and extraction utilities.

This module provides functions to create and extract weighting schemes from various data formats. Schemes can be created from dictionaries, reference dataframes (microdata), or long-format aggregate tables. Both simple (flat) and segmented (nested) weighting schemes are supported.

scheme_from_dict(distributions, name=None, rim_params=None)

Create a Rim scheme from a dictionary.

Supports two formats:

  1. Simple Scheme (Flat):

    {
        "age": {"18-24": 10, "25+": 90},
        "gender": {"M": 48, "F": 52}
    }
    

  2. Segmented Scheme (Nested):

    {
        "segment_by": "region",
        "segment_targets": {"A": 30, "B": 70},
        "segments": {
        "A": { "age": {...}, "gender": {...} },
        "B": { ... }
        }
    }
    

Parameters:

Name Type Description Default
distributions SchemeDict

Dictionary definition of the scheme

required
name str

Name of the schema

None
rim_params dict

Parameters for the Rim class

None

Returns:

Type Description
Rim

A configured Rim object

scheme_dict_from_df(df, cols_weighting, col_freq, col_filter=None)

Extract a weighting scheme dict from a reference microdata dataframe.

This is useful when you have a representative dataset (e.g., Census microdata or a high-quality random sample) where every row represents the combination of all demographic features, and you want to calculate targets dynamically based on its distributions.

Expected Input Format (Microdata):

Age Gender Region Weight/Freq
18-24 Male East 1.0
25-34 Female East 1.0
65+ Male West 2.5

Parameters:

Name Type Description Default
df DataFrame

The reference dataframe containing combinations of all demographic features

required
cols_weighting list of str

List of columns to calculate targets for (e.g. ['Age', 'Gender'])

required
col_freq str

Column containing the weight or frequency of each row. (For raw census data, this is often a column of 1s)

required
col_filter str

Optional column for segmentation (e.g. 'Region'). If provided, targets are calculated within each unique value of this column

None

Returns:

Type Description
SchemeDict

Dictionary containing the weighting scheme

scheme_dict_from_long_df(df, col_variable, col_category, col_value, col_filter=None)

Extract a weighting scheme dict from a 'Long' or 'Tidy' aggregate dataframe.

This is useful for census data where you have a table of totals rather than individual rows.

Expected Input Format:

Variable Category Value (Optional Filter/Region)
Age 18-24 500 East
Gender Male 480 East

Parameters:

Name Type Description Default
df DataFrame

The dataframe containing aggregate targets

required
col_variable str

Column name identifying the dimension (e.g. 'Age', 'Gender')

required
col_category str

Column name identifying the group (e.g. '18-24', 'Male')

required
col_value str

Column name identifying the target weight/count

required
col_filter str

Optional column for segmentation (e.g. 'Region')

None

Returns:

Type Description
SchemeDict

Dictionary containing the weighting scheme

scheme_from_df(df, cols_weighting, col_freq, col_filter=None, name=None, rim_params=None)

Extract a weighting scheme from a reference microdata dataframe.

This is useful when you have a representative dataset (e.g., Census microdata or a high-quality random sample) where every row represents the combination of all demographic features, and you want to calculate targets dynamically based on its distributions.

Expected Input Format (Microdata):

Age Gender Region Weight/Freq
18-24 Male East 1.0
25-34 Female East 1.0
65+ Male West 2.5

Parameters:

Name Type Description Default
df DataFrame

The reference dataframe containing combinations of all demographic features

required
cols_weighting list of str

List of columns to calculate targets for (e.g. ['Age', 'Gender'])

required
col_freq str

Column containing the weight or frequency of each row. (For raw census data, this is often a column of 1s)

required
col_filter str

Optional column for segmentation (e.g. 'Region'). If provided, targets are calculated within each unique value of this column

None
name str

Name of the schema

None
rim_params dict

Parameters for the Rim class

None

Returns:

Type Description
Rim

A configured Rim object

scheme_from_long_df(df, col_variable, col_category, col_value, col_filter=None, name=None, rim_params=None)

Extract a weighting scheme from a 'Long' or 'Tidy' aggregate dataframe.

This is useful for census data where you have a table of totals rather than individual rows.

Expected Input Format:

Variable Category Value (Optional Filter/Region)
Age 18-24 500 East
Gender Male 480 East

Parameters:

Name Type Description Default
df DataFrame

The dataframe containing aggregate targets

required
col_variable str

Column name identifying the dimension (e.g. 'Age', 'Gender')

required
col_category str

Column name identifying the group (e.g. '18-24', 'Male')

required
col_value str

Column name identifying the target weight/count

required
col_filter str

Optional column for segmentation (e.g. 'Region')

None
name str

Name of the schema

None
rim_params dict

Parameters for the Rim class

None

Returns:

Type Description
Rim

A configured Rim object