Changelog

0.3.2 - SHM slab allocator for wide-data workloads¶

25.02.17

When encoding objects with many sub-arrays — such as an InferenceData from a large MaxDiff model or a DataFrame with hundreds of columns — the codec layer allocated a separate SharedMemory block for every array. Each block consumes one POSIX file descriptor, and real-world workloads (500+ columns, repeated fit/predict cycles) quickly exhausted the macOS default limit of fds, crashing with OSError: [Errno 24] Too many open files. (Issue #57)

Slab allocator¶

ShmPool.open_slab() / seal_slab(): New arena-style allocator that pre-allocates a single large SharedMemory block and bump-allocates sub-regions from it with 64-byte (cache-line) alignment. When the slab fills, a new one is opened automatically.
InferenceDataCodec.encode: Now estimates total byte size across all groups, opens one slab, and encodes all coordinates and data variables into it. A 300-array InferenceData now uses 1–2 fds instead of 300.
PandasDFCodec.encode (fallback columnar path): Same slab wrapping for DataFrames encoded column-by-column.
Offset-aware transport: ShmRef and ShmBlock gain an offset field so multiple sub-allocations can share one named SHM segment. The decode path (get_buf, ShmArray.from_block) slices at the correct offset, fully transparent to codec consumers.
Backward compatible: offset defaults to 0; existing ShmRefs without the field work unchanged.

Testing¶

Added stress test with 2 500 columns × 4 iterations of fit/summary/predict to verify fd stability under extreme pressure.

0.3.1 - Standardized and modular InferenceData, improved memory management¶

25.12.23

This release standardizes the InferenceData structure across all prediction methods, ensuring consistent dimensions (chain, draw, obs_id) and variable naming conventions. It also improves shared-memory transport for Pandas DataFrames, enabling high-fidelity roundtripping of Categoricals and mixed types between R and Python.

Standardised idata¶

All idata returned from brmspy functions is now standardised to be joinable with one another, keep DataFrame indexes correctly in obs_id and works uniformly for univariate and multivariate models.

brm(): Optional return_idata: bool argument. In case of large models, using false and only running methods you may need can be better for memory management (e.g brms.posterior_pred(fit)). When return_idata=True the function now also includes constant_data (Issue #51)
posterior(): Returns draws in posterior and constant_data as idata. (Issue #51)
observed_data() Returns observed_data and constant_data as idata (Issue #51)
posterior_epred() Now returns predictions and predictions_constant_data in case there is newdata and posterior and constant_data when no newdata. Target variables are now suffixed with _mean. (Issue #51)
posterior_predict() Now returns predictions and predictions_constant_data in case there is newdata and posterior_predictive and constant_data when no newdata. idata. (Issue #51)
posterior_linpred() Now returns predictions and predictions_constant_data in case there is newdata and posterior and constant_data when no newdata. Target variables are now suffixed with _linpred. (Issue #51)
log_lik() (Issue #51) Always returns log_likelihood and depending on newdata=None returns constant_data or predictions_constant_data.
Added newdata kwarg based overloads for static typechecking to automatically recognise the correct returned groups for idata

This change allows composable architectures, where the user picks only the parts of idata they need for their analysis.

Pandas & R Type Conversion¶

Columnar SHM Transport: Improved ShmDataFrameColumns to transport DataFrames with mixed types via shared memory. Numeric and categorical columns now move between processes with zero-copy overhead, while complex object columns fall back to pickling individually.
Categorical Fidelity: R factors now correctly roundtrip to pandas.CategoricalDtype, preserving categories, integer codes, and ordered status across the main-worker boundary. (issue #52)
Broad Dtype Support: Enhanced converters to robustly handle pandas nullable integers (Int64), nullable floats, strings during R conversion.

Bug fixes and enhancements¶

Worker crash recovery (Issue #50): Added automatic recovery for R worker crashes (segfaults, BrokenPipeError, ConnectionResetError). The worker is restarted transparently and the call raises RWorkerCrashedError. The exception includes a recovered: bool flag indicating whether a clean worker session was successfully started, allowing pipelines to distinguish retryable crashes (recovered=True) from hard failures (recovered=False).
Numpy Encoding: Standardised encoding for object arrays. String arrays are now optimized as ShmArray; mixed object arrays gracefully fall back to pickling.
Improved SHM memory management: Introduced explicit temporary buffers that are cleaned up immediately after use, while non-temporary buffers are now tracked by ShmPool only until the next main <-> worker exchange; buffer lifetime is then transferred to CodecRegistry, which ties shared-memory mappings to reconstructed objects via weakrefs, minimizing the number of active mappings and allowing timely resource release once those objects are garbage-collected.

0.3.0 - Process-Isolated R & Hot-Swappable Runtimes¶

25.12.14

This release introduces a redesigned main–worker–R architecture to address stability issues caused by embedding R directly in the Python process. In real-world use, unpredictable failures, ranging from R segfaults to rpy2 crashes, could take down the entire Python runtime, invalidate IDE sessions, and make test behavior OS-dependent. The old single-process model also made R state effectively immutable after import, limited runtime switching, and required brittle workarounds for package management and CI isolation. These issues were not fixable at the level of defensive coding alone.

The new architecture jails R inside a dedicated worker process with shared-memory transport, zero-copy codecs, and a proxy module session that preserves the public API. All R data structures (matrices, data frames, ArviZ objects) transfer via shared memory without heap duplication, keeping memory use equivalent to the original design even for multi-gigabyte posteriors. R runtimes and environments are now fully isolated, hot-swappable, and safely mutable through a context manager. Worker-side crashes no longer affect the main interpreter, and previously fragile operations (e.g., loo functions) run without instability. The result is a predictable, reproducible, OS-agnostic execution model with significantly reduced failure modes.

Breaking Changes¶

Removal of top-level functions: brmspy.fit, brmspy.install_brms, and other direct exports from the root package have been removed. Users should import the brms module (e.g., from brmspy import brms or from brmspy.brms import bf).
install_brms API change: Global installation functions (install_brms, install_runtime, install_rpackage) are removed from the public namespace. Installation and environment modification must now be performed inside a brms.manage() context to ensure safe worker restarts.
Opaque R handles: The .r attribute on result objects (e.g., FitResult.r) is no longer a live rpy2 object but a SexpWrapper handle. These handles cannot be manipulated directly in the main process but can be passed back into brms functions for processing in the worker. They retain the R repr for debugging purposes.
Runtime module internalization: brmspy.runtime has been moved to brmspy._runtime and is considered internal. Public runtime interactions should occur via brms.manage().
Formula logic: FormulaResult has been replaced by FormulaConstruct. Formulas are now built as pure Python DSL trees and only converted to R objects during execution in the worker.

New Features¶

Context-managed environments: Added brms.manage(), a context manager that spins up a dedicated worker for administrative tasks. Exposes methods ctx.install_brms(), ctx.install_runtime(), and ctx.install_rpackage() among others which persist changes to the active environment.
Multi-environment support: Users can create and switch between named environments (e.g., with brms.manage(environment_name="dev"): ...) which maintain separate user libraries (Rlib) layered on top of the base runtime.
Environment persistence: Active environment configurations are saved to ~/.brmspy/environment_state.json and ~/.brmspy/environment/<name>/config.json.
Status API: Added brms.environment_exists() and brms.environment_activate() helpers for managing the lifecycle of R environments programmatically.

Environments & Runtime¶

Process Isolation: R now runs in a dedicated spawned worker process. Calls from the main process are serialized and sent via IPC.
Shared Memory Transport: Implemented a custom SharedMemoryManager based transport layer. Large numeric payloads (NumPy arrays, pandas DataFrames, ArviZ InferenceData) are written to shared memory buffers, avoiding serialization overhead.
Hot-swappable sessions: The R worker can be restarted with a different configuration (R_HOME, library paths) on the fly without restarting the Python interpreter.
Zero-copy codecs: Added internal codecs (NumpyArrayCodec, PandasDFCodec, InferenceDataCodec) that handle SHM allocation and view reconstruction transparently.
Sexp Cache: Implemented a worker-side cache for R objects (Sexp). The main process holds lightweight SexpWrapper references (by ID) which are rehydrated into real R objects when passed back to the worker.

API & Behaviour¶

Pure Python Formulas: Formula helpers (bf, lf, nlf, etc.) now return FormulaConstruct dataclasses. This allows formula composition (+ operator) to happen in Python without requiring a running R session until fit time.
Worker Proxy: The brmspy.brms module is now a dynamic proxy (RModuleSession). Accessing attributes triggers remote lookups, and calling functions triggers IPC requests.
Logging Bridge: Worker-side logs (including R console output) are captured and forwarded to the main process's logging handlers via a QueueListener.

Documentation & Infrastructure¶

Versioned Documentation: Added mike support for deploying versioned docs (e.g., /0.3/, /stable/) to GitHub Pages via the docs-versioned workflow.
Architecture enforcement: Added import-linter with strict contracts to prevent leakage of internal layers (e.g., ensuring rpy2.robjects is never imported in the main process).
Internal Docs Generation: Added scripts to auto-generate API reference stubs for internal modules (_runtime, _session, etc.) to aid development.

Testing & CI¶

Hot-swap stress tests: Added tests that repeatedly restart the entire R runtime and SharedMemoryManager in a loop, then immediately access old SHM-backed arrays and InferenceData. These scenarios would crash instantly if any lifetime or reference handling were incorrect, making them an effective torture test of the new architecture.
Worker Test Marker: Introduced @pytest.mark.worker and a worker_runner fixture to execute specific tests inside the isolated worker process.
Coverage Aggregation: Updated CI to merge coverage reports from the main process and the spawned worker process.
R Dependency Tests: Switched r-dependencies-tests workflow to use the new isolated test runner script.

0.2.1 - Stability hotfix¶

25.12.07

Try to enforce rpy2 RPY2_CFFI_MODE ABI mode on import with warnings if not possible. API/BOTH can cause instability on linux and macos (Issue: #45)
Added R_HOME and LD_LIBRARY_PATH to github workflows (required on most environments in ABI mode)
The environment now does its best attempts to detect invalid R setups and log them

0.2.0 - Runtime Refactor & Formula DSL¶

25.12.07

Breaking Changes¶

Removed Diagnostics: Removed loo, loo_compare, and add_criterion due to frequent segfaults in embedded R mode. Users should rely on arviz.loo and arviz.compare using the idata property of the fit result.
Installation API: Renamed use_prebuilt_binaries argument to use_prebuilt in install_brms().
Installation API now consists of: install_brms, install_runtime, deactivate_runtime, activate_runtime, find_local_runtime, get_active_runtime, get_brms_version
Deprecations: Renamed fit to brm and formula to bf. Previous names are still exported as aliases, but might be removed in a future version.

New Features¶

Formula DSL: Implemented bf, lf, nlf, acformula, set_rescor, set_mecor, and set_nl. These objects support additive syntax (e.g., bf(...) + set_rescor(True) + gaussian()) mirroring native brms behavior.
Generic Data Loader: Added get_data() to load datasets from any installed R package, complementing get_brms_data().
Runtime Status: Added brmspy.runtime.status() to programmatically inspect the current R environment, toolchain compatibility, and active runtime configuration.
Families now in package root: Families can now be imported from package root, e.g from brmspy import gaussian

Runtime & Installation¶

Core Refactor: Completely re-architected brmspy.runtime into strict layers (_config, _r_env, _platform, _install, etc) to eliminate side effects during import and prevent circular dependencies.
Atomic Activation: activate_runtime() now validates manifest integrity and system fingerprints before mutating the R environment, ensuring atomic success or rollback.
Auto-Persistence: The last successfully activated runtime is automatically restored on module import via runtime._autoload, creating persistent sessions across restarts.
Windows Toolchain: Modularized RTools detection logic to accurately map R versions to RTools versions (4.0–4.5) and handle path updates safely.

Documentation & Infrastructure¶

MkDocs Migration: Ported all documentation to MkDocs with the Material theme for better navigability and API references.
Rendered notebooks: Added more notebook examples that are now rendered fully with links to running each in Google Colab.
ArViz diagnostics examples: can now be found under API reference
Test coverage: Test coverage for brms functions is now at 88% and for R environment and package management at 68%

0.1.13 - Enhanced Diagnostics & Type-Safe Summaries¶

25.12.04

Diagnostics¶

summary() Rewrite: Returns SummaryResult dataclass with structured access to fixed, spec_pars, random, prior, and model metadata. Includes pretty-print support.
fixef(): Extract population-level effects as DataFrame. Supports summary, robust, probs, and pars arguments.
ranef(): Extract group-level effects as xarray DataArrays. Returns dict mapping grouping factors to arrays with configurable summary/raw modes.
posterior_summary(): Extract all model parameters (fixed, random, auxiliary) as DataFrame. Supports variable selection and regex patterns.
prior_summary(): Return DataFrame of prior specifications. Option to show all priors or only user-specified.
loo(): Compute LOO-CV using PSIS. Returns LooResult with elpd_loo, p_loo, looic, and Pareto k diagnostics.
loo_compare(): Compare multiple models via LOO-CV. Returns LooCompareResult ranked by performance with elpd_diff and standard errors.
validate_newdata(): Validate prediction data against fitted model requirements. Checks variables, factor levels, and grouping structure.

Type System¶

DataFrame Detection: r_to_py() now correctly preserves row indexes, column names, and proper type conversion from R DataFrames.
LooResult/LooCompareResult: Added __repr__() for formatted notebook output.

Generic Function Access¶

call(): Universal wrapper for calling any brms or R function by name with automatic type conversion.
sanitised_name(): Helper to convert Python-style names to valid R identifiers.

Testing¶

Added 14 tests covering all new diagnostics functions.
Optimized test iterations (iter=100, warmup=50) for faster CI.

0.1.12 - RDS I/O, Families Module, Default Priors¶

25.12.03

New Features¶

save_rds(): Save brmsfit or generic R objects to RDS files.
load_rds_fit(): Load saved brmsfit objects, returning FitResult with attached InferenceData.
load_rds_raw(): Load arbitrary R objects from RDS files.
brm Alias: Added brm as alias for fit.

Families¶

Added brmspy.families module with brmsfamily() and family() wrappers.
Implemented keyword-argument wrappers for 40+ families: student, bernoulli, beta_binomial, negbinomial, geometric, lognormal, shifted_lognormal, skew_normal, exponential, weibull, frechet, gen_extreme_value, exgaussian, wiener, Beta, dirichlet, logistic_normal, von_mises, asym_laplace, cox, hurdle_*, zero_inflated_*, categorical, multinomial, cumulative, sratio, cratio, acat.

Priors¶

default_prior(): Retrieve default priors for a model formula and dataset.
get_prior(): Inspect prior structure before fitting.

Internal¶

Reorganized brms wrappers into modular files under brmspy/brms_functions/.
Added RListVectorExtension protocol for automatic R list extraction in type conversion.

0.1.11 - Persistent Runtimes & Logging¶

25.12.01

New Features¶

Persistent Runtimes: Activated runtime path saved to ~/.brmspy/config.json and auto-loaded on import.
Configurable Logging: Replaced print statements with centralized logger.
Optimized Activation: Made aggressive unloading conditional for faster runtime activation.

0.1.10 - Windows Stability & CI Improvements¶

25.12.01

Windows Support¶

Implemented aggressive R package unloading (detach, unloadNamespace, DLL unload) to prevent file locking errors.
Refined RTools detection; relaxed g++ version requirements when valid RTools is detected.
Changed install_rtools default to False in install_brms() to prevent unintended PATH modifications.
Fixed PowerShell command syntax generation.
Windows prebuilt binaries currently require R4.5.

Build & CI¶

Expanded CI matrix: Windows, macOS, Ubuntu on Python 3.12.
Optimized GitHub Actions caching for R libraries and CmdStan.
Fixed artifact pruning logic in runtime builder workflows.

Bug Fixes¶

Ensured jsonlite dependency is explicitly resolved during manifest generation.
Fixed workflow path referencing and quoting issues.

0.1.9 - Prebuilt Runtimes & Windows Toolchain¶

25.11.30

New Features¶

Prebuilt Runtimes: Added brmspy.binaries subpackage for precompiled R environments with brms and cmdstanr (up to 50x faster install).
Fast Installation: Added use_prebuilt_binaries=True argument to install_brms().
Windows Toolchain: Automatic Rtools (MinGW-w64) detection and installation in install_brms().

Enhancements¶

Linux Binaries: Prioritize Posit Package Manager (P3M) binary repositories based on OS codename.
Documentation: Added docstrings to all public and internal functions.

Infrastructure¶

Added .runtime_builder Dockerfiles for reproducible Linux runtime environments.

0.1.8 - RStan Support & Version Pinning¶

25.11.29

New Features¶

RStan Backend: Added rstan as alternative backend. install_brms() accepts install_rstan param; fit() accepts backend="rstan".
Version Pinning: install_brms() supports pinning specific R package versions (e.g., version="2.21.0") via remotes.

Platform Support¶

Windows Toolchain: Automatic Rtools detection and setup in install_brms().
macOS/Windows Binaries: Fixed installation failures by defaulting to type="both" instead of forcing source compilation.

Infrastructure¶

Added cross-platform CI workflow (Windows, macOS, Ubuntu).

0.1.7 - Import Fixes¶

25.11.29

Fixed library refusing import when R dependencies are missing.
R libraries now automatically imported after installation.

0.1.6 - Segfault Fix & Stability¶

25.11.29

Core Stability¶

Fixed segfault occurring when fit() was called inside tqdm loops or repeated call contexts.
All R imports (brms, cmdstanr, posterior) now performed once at module import, never inside functions.

Performance¶

Repeated model fits now faster due to eliminated R namespace reloads.
Reduced memory churn by removing redundant converter/namespace setup.

Testing¶

Added test_fit_tqdm_segfault() regression test.

0.1.5 - Priors, Formula Helper, Typed ArviZ¶

25.11.28

API & Types¶

formula(): Added helper for building reusable model formulas with kwargs support.
Typed ArviZ Aliases: Added IDFit, IDPredict, IDLinpred, IDLogLik, IDEpred for different InferenceData shapes.
Exported Types: FitResult, PosteriorEpredResult, PosteriorPredictResult, PosteriorLinpredResult, LogLikResult, GenericResult now in public API.

Priors¶

prior() Helper: Now recommended way to specify priors instead of raw tuples.
Improved internal prior-building logic for better mapping to brms::set_prior().
Supports class_, coef, group, dpar combinations.

Internal¶

Improved fit() kwargs parsing for more robust forwarding to brms/cmdstanr.
Expanded test coverage for priors, get_stancode, summary, and fit-without-sampling paths.