cgmm: Conditional Gaussian Mixture Models

CI Coverage PyPI version Python versions License Documentation

cgmm is a Python library for Conditional Gaussian Mixture Models that seamlessly integrates with scikit-learn. It enables you to fit a joint Gaussian mixture on your data and then condition on a subset of variables to obtain the posterior distribution of the remaining ones.

Install

pip install cgmm

Requirements

  • Python 3.9–3.12

  • NumPy, SciPy, scikit‑learn (installed automatically)

  • For the example plots: Matplotlib

Quick Start

import numpy as np
from sklearn.mixture import GaussianMixture
from cgmm import ConditionalGMMRegressor

# Generate sample data
rng = np.random.default_rng(42)
X = rng.normal(size=(200, 2))
y = 2 * X[:, 0] + X[:, 1] + 0.1 * rng.normal(size=200)

# Fit conditional GMM regressor
model = ConditionalGMMRegressor(n_components=3, random_state=42)
model.fit(X, y)

# Make predictions with uncertainty
X_new = np.array([[1.0, 0.5]])
y_pred = model.predict(X_new)  # Mean prediction
y_cov = model.predict_cov(X_new)  # Covariance matrix

print(f"Prediction: {y_pred[0]:.3f}")
print(f"Uncertainty: {np.sqrt(y_cov[0, 0]):.3f}")

Key Features

  • 🔗 Scikit-learn Compatible: Drop-in replacement for regression tasks

  • 📊 Multimodal Predictions: Capture complex, multi-peaked distributions

  • ⚡ Multiple Algorithms: Conditional GMM, Mixture of Experts, and Discriminative approaches

  • 🎯 Uncertainty Quantification: Full covariance matrices, not just point estimates

  • 🔧 Production Ready: Well-tested, documented, and actively maintained

Installation

pip install cgmm

Requirements: Python 3.9+, NumPy, SciPy, scikit-learn

Use Cases

  • Multimodal Regression: Predict complex, multi-peaked target distributions

  • Scenario Simulation: Generate realistic synthetic data for forecasting

  • Missing Data Imputation: Fill gaps using learned conditional distributions

  • Inverse Problems: Solve kinematics, finance, and volatility modeling tasks

  • Uncertainty Quantification: Provide confidence intervals and risk measures

  • Weather Modeling: Analyze meteorological data with seasonal and daily patterns

Models Available

  • ConditionalGMMRegressor: Joint GMM with analytical conditioning

  • MixtureOfExpertsRegressor: Softmax-gated experts with linear mean functions

  • DiscriminativeConditionalGMMRegressor: Direct conditional likelihood optimization

  • GMMConditioner: Low-level API for custom conditioning workflows

Why cgmm?

Traditional regression methods assume unimodal, normally distributed residuals. Real-world data often exhibits:

  • Multiple modes in the target distribution

  • Complex dependencies between features and targets

  • Heteroscedastic noise that varies with input

  • Non-linear relationships that linear models miss

cgmm addresses these challenges by modeling the full conditional distribution rather than just the mean, enabling:

  • More accurate predictions in complex scenarios

  • Proper uncertainty quantification

  • Generation of realistic synthetic data

  • Better handling of multimodal target distributions

Performance

cgmm is optimized for both accuracy and speed:

  • Fast training with efficient EM algorithms

  • Scalable to thousands of samples and moderate dimensions

  • Memory efficient with sparse covariance representations

  • Parallel execution where possible

Citation

If you use cgmm in your research, please cite:

@software{cgmm2024,
  title={cgmm: Conditional Gaussian Mixture Models for Python},
  author={van den Berg, Thijs},
  year={2024},
  url={https://github.com/sitmo/cgmm},
  version={0.3.2}
}

Examples

Check out our comprehensive examples:

  • Weather Analysis: 10-year KNMI meteorological data with seasonal patterns and 2D PDF visualization

  • 2D Conditional GMM: Basic conditional modeling with uncertainty quantification

  • Digits Classification: Handwritten digit recognition with multimodal distributions

  • Iris Dataset: Classic classification with conditional GMM

  • Regression Models: Comparison of different cgmm algorithms

  • VIX Predictor: Financial volatility modeling and forecasting

Contributing

We welcome contributions!

License

BSD 3-Clause License - see LICENSE for details.