Scheme creation
scheme
¶
Scheme creation and extraction utilities.
This module provides functions to create and extract weighting schemes from various data formats. Schemes can be created from dictionaries, reference dataframes (microdata), or long-format aggregate tables. Both simple (flat) and segmented (nested) weighting schemes are supported.
scheme_from_dict(distributions, name=None, rim_params=None)
¶
Create a Rim scheme from a dictionary.
Supports two formats:
-
Simple Scheme (Flat):
-
Segmented Scheme (Nested):
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
distributions
|
SchemeDict
|
Dictionary definition of the scheme |
required |
name
|
str
|
Name of the schema |
None
|
rim_params
|
dict
|
Parameters for the Rim class |
None
|
Returns:
| Type | Description |
|---|---|
Rim
|
A configured Rim object |
scheme_dict_from_df(df, cols_weighting, col_freq, col_filter=None)
¶
Extract a weighting scheme dict from a reference microdata dataframe.
This is useful when you have a representative dataset (e.g., Census microdata or a high-quality random sample) where every row represents the combination of all demographic features, and you want to calculate targets dynamically based on its distributions.
Expected Input Format (Microdata):
| Age | Gender | Region | Weight/Freq |
|---|---|---|---|
| 18-24 | Male | East | 1.0 |
| 25-34 | Female | East | 1.0 |
| 65+ | Male | West | 2.5 |
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df
|
DataFrame
|
The reference dataframe containing combinations of all demographic features |
required |
cols_weighting
|
list of str
|
List of columns to calculate targets for (e.g. ['Age', 'Gender']) |
required |
col_freq
|
str
|
Column containing the weight or frequency of each row. (For raw census data, this is often a column of 1s) |
required |
col_filter
|
str
|
Optional column for segmentation (e.g. 'Region'). If provided, targets are calculated within each unique value of this column |
None
|
Returns:
| Type | Description |
|---|---|
SchemeDict
|
Dictionary containing the weighting scheme |
scheme_dict_from_long_df(df, col_variable, col_category, col_value, col_filter=None)
¶
Extract a weighting scheme dict from a 'Long' or 'Tidy' aggregate dataframe.
This is useful for census data where you have a table of totals rather than individual rows.
Expected Input Format:
| Variable | Category | Value | (Optional Filter/Region) |
|---|---|---|---|
| Age | 18-24 | 500 | East |
| Gender | Male | 480 | East |
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df
|
DataFrame
|
The dataframe containing aggregate targets |
required |
col_variable
|
str
|
Column name identifying the dimension (e.g. 'Age', 'Gender') |
required |
col_category
|
str
|
Column name identifying the group (e.g. '18-24', 'Male') |
required |
col_value
|
str
|
Column name identifying the target weight/count |
required |
col_filter
|
str
|
Optional column for segmentation (e.g. 'Region') |
None
|
Returns:
| Type | Description |
|---|---|
SchemeDict
|
Dictionary containing the weighting scheme |
scheme_from_df(df, cols_weighting, col_freq, col_filter=None, name=None, rim_params=None)
¶
Extract a weighting scheme from a reference microdata dataframe.
This is useful when you have a representative dataset (e.g., Census microdata or a high-quality random sample) where every row represents the combination of all demographic features, and you want to calculate targets dynamically based on its distributions.
Expected Input Format (Microdata):
| Age | Gender | Region | Weight/Freq |
|---|---|---|---|
| 18-24 | Male | East | 1.0 |
| 25-34 | Female | East | 1.0 |
| 65+ | Male | West | 2.5 |
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df
|
DataFrame
|
The reference dataframe containing combinations of all demographic features |
required |
cols_weighting
|
list of str
|
List of columns to calculate targets for (e.g. ['Age', 'Gender']) |
required |
col_freq
|
str
|
Column containing the weight or frequency of each row. (For raw census data, this is often a column of 1s) |
required |
col_filter
|
str
|
Optional column for segmentation (e.g. 'Region'). If provided, targets are calculated within each unique value of this column |
None
|
name
|
str
|
Name of the schema |
None
|
rim_params
|
dict
|
Parameters for the Rim class |
None
|
Returns:
| Type | Description |
|---|---|
Rim
|
A configured Rim object |
scheme_from_long_df(df, col_variable, col_category, col_value, col_filter=None, name=None, rim_params=None)
¶
Extract a weighting scheme from a 'Long' or 'Tidy' aggregate dataframe.
This is useful for census data where you have a table of totals rather than individual rows.
Expected Input Format:
| Variable | Category | Value | (Optional Filter/Region) |
|---|---|---|---|
| Age | 18-24 | 500 | East |
| Gender | Male | 480 | East |
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df
|
DataFrame
|
The dataframe containing aggregate targets |
required |
col_variable
|
str
|
Column name identifying the dimension (e.g. 'Age', 'Gender') |
required |
col_category
|
str
|
Column name identifying the group (e.g. '18-24', 'Male') |
required |
col_value
|
str
|
Column name identifying the target weight/count |
required |
col_filter
|
str
|
Optional column for segmentation (e.g. 'Region') |
None
|
name
|
str
|
Name of the schema |
None
|
rim_params
|
dict
|
Parameters for the Rim class |
None
|
Returns:
| Type | Description |
|---|---|
Rim
|
A configured Rim object |