Digits Conditional Modeling Study¶

This notebook studies how different conditional regression models learn to generate digit images conditioned on their labels.

Problem Setup:

Targets: 8 PCA components of 64-pixel digit images
Conditioning: 10-dimensional one-hot encoded digit labels (0-9)
Goal: Learn p(pixels | digit_label) - generate realistic digit images given the label

Models Compared:

ConditionalGMMRegressor: Joint GMM over [X, y] with analytical conditioning
MixtureOfExpertsRegressor: Linear-softmax gating with Gaussian experts
DiscriminativeConditionalGMMRegressor: Discriminative EM for conditional likelihood

Visualization: For each model, we'll show a 9×5 grid where:

Rows represent digits 0-9
Columns show 5 random samples generated for each digit
This allows us to assess the quality and diversity of generated digits

Data Preparation¶

Load the digits dataset and prepare it for conditional modeling:

Dataset shape: (1797, 64)
Labels: [0 1 2 3 4 5 6 7 8 9]
Label distribution: [178 182 177 183 181 182 181 179 174 180]

PCA Results:
  Explained variance ratio: 0.793
  PCA components shape: (1797, 20)

Hyperparameter Optimization¶

Perform out-of-sample hyperparameter search by sweeping over n_components = 2..20 for all three models:

Show code cell source

Hide code cell source

# Split data for hyperparameter optimization
X_train, X_test, y_train, y_test = train_test_split(
    X_conditioning, y_target, test_size=0.3, random_state=42
)

# Define hyperparameter search range (reduced for faster doc builds)
n_components_range = [2,3,4,5,7,10,15,20]

# Initialize results storage
hyperparameter_results = {
    'ConditionalGMMRegressor': {'n_components': [], 'log_likelihood': [], 'mse': [], 'r2': []},
    'MixtureOfExpertsRegressor': {'n_components': [], 'log_likelihood': [], 'mse': [], 'r2': []},
    'DiscriminativeConditionalGMMRegressor': {'n_components': [], 'log_likelihood': [], 'mse': [], 'r2': []}
}

# Cache best models during search to avoid retraining
best_models = {
    'ConditionalGMMRegressor': {'model': None, 'score': -np.inf},
    'MixtureOfExpertsRegressor': {'model': None, 'score': -np.inf},
    'DiscriminativeConditionalGMMRegressor': {'model': None, 'score': -np.inf}
}

for n_comp in n_components_range:
    

    model = ConditionalGMMRegressor(n_components=n_comp, random_state=42)
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)
    mse = mean_squared_error(y_test, y_pred)
    r2 = r2_score(y_test, y_pred)
    log_likelihood = model.score(X_test, y_test)
    
    # Cache best model if this is the best score so far
    if log_likelihood > best_models['ConditionalGMMRegressor']['score']:
        best_models['ConditionalGMMRegressor']['model'] = model
        best_models['ConditionalGMMRegressor']['score'] = log_likelihood
    
    hyperparameter_results['ConditionalGMMRegressor']['n_components'].append(n_comp)
    hyperparameter_results['ConditionalGMMRegressor']['log_likelihood'].append(log_likelihood)
    hyperparameter_results['ConditionalGMMRegressor']['mse'].append(mse)
    hyperparameter_results['ConditionalGMMRegressor']['r2'].append(r2)
    
        

    model = MixtureOfExpertsRegressor(n_components=n_comp, random_state=42)
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)
    mse = mean_squared_error(y_test, y_pred)
    r2 = r2_score(y_test, y_pred)
    log_likelihood = model.score(X_test, y_test)
    
    # Cache best model if this is the best score so far
    if log_likelihood > best_models['MixtureOfExpertsRegressor']['score']:
        best_models['MixtureOfExpertsRegressor']['model'] = model
        best_models['MixtureOfExpertsRegressor']['score'] = log_likelihood
    
    hyperparameter_results['MixtureOfExpertsRegressor']['n_components'].append(n_comp)
    hyperparameter_results['MixtureOfExpertsRegressor']['log_likelihood'].append(log_likelihood)
    hyperparameter_results['MixtureOfExpertsRegressor']['mse'].append(mse)
    hyperparameter_results['MixtureOfExpertsRegressor']['r2'].append(r2)
    

    model = DiscriminativeConditionalGMMRegressor(
        n_components=n_comp, 
        covariance_type='full',
        reg_covar=1e-2,
        random_state=42
    )
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)
    mse = mean_squared_error(y_test, y_pred)
    r2 = r2_score(y_test, y_pred)
    log_likelihood = model.score(X_test, y_test)
    
    # Cache best model if this is the best score so far
    if log_likelihood > best_models['DiscriminativeConditionalGMMRegressor']['score']:
        best_models['DiscriminativeConditionalGMMRegressor']['model'] = model
        best_models['DiscriminativeConditionalGMMRegressor']['score'] = log_likelihood
    
    hyperparameter_results['DiscriminativeConditionalGMMRegressor']['n_components'].append(n_comp)
    hyperparameter_results['DiscriminativeConditionalGMMRegressor']['log_likelihood'].append(log_likelihood)
    hyperparameter_results['DiscriminativeConditionalGMMRegressor']['mse'].append(mse)
    hyperparameter_results['DiscriminativeConditionalGMMRegressor']['r2'].append(r2)
    

Optimal Hyperparameters (Out-of-Sample Performance):
======================================================================
ConditionalGMMRegressor:
  Optimal n_components: 7
  Best log-likelihood: -21.097
  Best MSE: 1.302
  Best R²: 0.247

MixtureOfExpertsRegressor:
  Optimal n_components: 7
  Best log-likelihood: -18.816
  Best MSE: 1.833
  Best R²: -0.053

DiscriminativeConditionalGMMRegressor:
  Optimal n_components: 15
  Best log-likelihood: -18.123
  Best MSE: 1.316
  Best R²: 0.227

======================================================================

../_images/12fb55f76a8324d231be64f8ba00f6dae0d1bf585d68f01655d63cbc5ebf7a25.png

<Figure size 640x480 with 0 Axes>

Digit Generation and Visualization¶

Generate random samples for each digit (0-9) using each model and visualize them in a 10×5 grid:

Generating 8 samples for each digit (0-9) using best models...
  ConditionalGMMRegressor: n_components = 7
  MixtureOfExpertsRegressor: n_components = 7
  DiscriminativeConditionalGMMRegressor: n_components = 15

../_images/bd440841891253c6d40cbcde522562fc7eb9eaafe7765972880df083a3215a84.png

Gallery Image¶

fig, ax = plt.subplots(1, 1, figsize=(12, 6))

model_name = 'DiscriminativeConditionalGMMRegressor'

# Create a 8x10 grid with 1 pixel spacing between 8x8 images (transposed)
# Grid size: 8 rows * (8 pixels + 1 spacing) - 1, 10 cols * (8 pixels + 1 spacing) - 1
grid = np.zeros((n_samples_per_digit * 9 - 1, 10 * 9 - 1))  # 71x89 pixels

# Fill grid with white background (16 is max pixel value, so 16 = white)
grid.fill(16)

for digit in range(10):
    if digit in generated_samples[model_name]:
        samples = generated_samples[model_name][digit]
        for sample_idx in range(min(n_samples_per_digit, len(samples))):
            # Reshape to 8x8 image
            image = samples[sample_idx].reshape(8, 8)
            
            # Place in grid with 1 pixel spacing (transposed)
            row_start = sample_idx * 9  # 8 pixels + 1 spacing
            col_start = digit * 9  # 8 pixels + 1 spacing
            grid[row_start:row_start+8, col_start:col_start+8] = image

# Display the grid
im = ax.imshow(grid, cmap='gray', vmin=0, vmax=16)
ax.set_title('Random Handwritten Digits', fontsize=14)
ax.set_xlabel('Digit')
ax.set_ylabel('Sample Index')

# Set ticks (adjusted for 1 pixel spacing)
ax.set_xticks(range(4, 10*9, 9))  # Center of each 8x8 image with spacing
ax.set_xticklabels(range(10))
ax.set_yticks(range(4, n_samples_per_digit*9, 9))  # Center of each 8x8 image with spacing
ax.set_yticklabels(range(1, n_samples_per_digit + 1))

# No grid lines needed - white padding already provides separation

plt.tight_layout()
plt.savefig('gallery_images/digits.png', dpi=150, bbox_inches='tight')

plt.show()

../_images/776c558e1165ec24a80d94aa00a52d69b96fa4aeb1d3fe5ebc232782146c1717.png