Evaluation Metrics
This guide covers the comprehensive evaluation metrics available in QGANS Pro for assessing the quality and performance of quantum-enhanced generative models.
π― Overview
Evaluating quantum GANs requires both classical GAN metrics and quantum-specific measures. QGANS Pro provides:
- Classical Metrics: FID, IS, LPIPS for general quality assessment
- Quantum Metrics: Fidelity, entanglement measures, circuit complexity
- Fairness Metrics: Bias detection and mitigation assessment
- Performance Metrics: Training convergence and computational efficiency
π Classical Metrics
FrΓ©chet Inception Distance (FID)
Measures the distance between real and generated data distributions:
from qgans_pro.utils import FIDScore
import torch
# Initialize FID metric
fid_metric = FIDScore(device='cuda')
# Calculate FID score
real_samples = torch.randn(1000, 3, 64, 64) # Real data
generated_samples = qgan.generate_samples(1000) # Generated data
fid_score = fid_metric(real_samples, generated_samples)
print(f"FID Score: {fid_score:.2f}")
Interpretation: - Lower values indicate better quality - Typical range: 1-300 - FID < 10: Excellent quality - FID < 50: Good quality - FID > 100: Poor quality
Inception Score (IS)
Evaluates both quality and diversity of generated samples:
from qgans_pro.utils import InceptionScore
# Initialize IS metric
is_metric = InceptionScore(device='cuda')
# Calculate Inception Score
is_mean, is_std = is_metric(generated_samples)
print(f"Inception Score: {is_mean:.2f} Β± {is_std:.2f}")
Interpretation: - Higher values indicate better quality and diversity - Range: 1-β - IS > 8: Excellent (ImageNet level) - IS > 5: Good - IS < 3: Poor
Learned Perceptual Image Patch Similarity (LPIPS)
Measures perceptual similarity between images:
from qgans_pro.utils import LPIPSScore
# Initialize LPIPS metric
lpips_metric = LPIPSScore(network='alex', device='cuda')
# Calculate LPIPS score
lpips_score = lpips_metric(real_samples, generated_samples)
print(f"LPIPS Score: {lpips_score:.4f}")
βοΈ Quantum-Specific Metrics
Quantum Fidelity
Measures how well quantum states are preserved during generation:
from qgans_pro.utils import QuantumFidelity
# Initialize quantum fidelity metric
qf_metric = QuantumFidelity(device='cuda')
# Calculate quantum metrics
quantum_metrics = qf_metric(real_samples, generated_samples)
print(f"Quantum Fidelity: {quantum_metrics['fidelity']:.4f}")
print(f"Entanglement Measure: {quantum_metrics['entanglement']:.4f}")
print(f"Circuit Depth: {quantum_metrics['circuit_depth']}")
Quantum Advantage Score
Compares quantum vs classical performance:
from qgans_pro.utils import QuantumAdvantageScore
# Compare quantum and classical models
qa_metric = QuantumAdvantageScore()
advantage_score = qa_metric(
quantum_model=quantum_gan,
classical_model=classical_gan,
test_data=test_loader
)
print(f"Quantum Advantage: {advantage_score:.2f}")
Expressibility and Entangling Capability
Evaluate quantum circuit properties:
from qgans_pro.utils import CircuitMetrics
# Analyze quantum circuit
circuit_metric = CircuitMetrics(backend='qiskit')
circuit_analysis = circuit_metric.analyze_circuit(quantum_generator.circuit)
print(f"Expressibility: {circuit_analysis['expressibility']:.4f}")
print(f"Entangling Capability: {circuit_analysis['entangling_capability']:.4f}")
print(f"Effective Dimension: {circuit_analysis['effective_dimension']}")
βοΈ Fairness Metrics
Demographic Parity
Ensures equal representation across groups:
from qgans_pro.utils import FairnessMetrics
# Initialize fairness evaluator
fairness_metric = FairnessMetrics()
# Evaluate demographic parity
dp_score = fairness_metric.demographic_parity(
generated_samples=generated_samples,
sensitive_attributes=sensitive_attrs
)
print(f"Demographic Parity: {dp_score:.4f}")
Equalized Odds
Measures fairness in prediction accuracy:
# Evaluate equalized odds
eo_score = fairness_metric.equalized_odds(
generated_samples=generated_samples,
true_labels=true_labels,
sensitive_attributes=sensitive_attrs
)
print(f"Equalized Odds: {eo_score:.4f}")
Individual Fairness
Ensures similar individuals receive similar treatment:
# Measure individual fairness
if_score = fairness_metric.individual_fairness(
generated_samples=generated_samples,
distance_metric='euclidean',
threshold=0.1
)
print(f"Individual Fairness: {if_score:.4f}")
π Performance Metrics
Training Convergence
Monitor training stability and convergence:
from qgans_pro.utils import ConvergenceMetrics
# Track convergence during training
convergence_metric = ConvergenceMetrics()
# Add to training loop
for epoch in range(epochs):
# ... training code ...
# Log convergence metrics
convergence_metric.update(
generator_loss=g_loss,
discriminator_loss=d_loss,
gradient_penalty=gp_loss
)
# Get convergence analysis
convergence_report = convergence_metric.get_report()
print(f"Training Stability: {convergence_report['stability']:.4f}")
print(f"Convergence Rate: {convergence_report['convergence_rate']:.4f}")
Computational Efficiency
Measure quantum circuit efficiency:
from qgans_pro.utils import EfficiencyMetrics
# Analyze computational efficiency
efficiency_metric = EfficiencyMetrics()
efficiency_report = efficiency_metric.analyze(
model=quantum_gan,
n_samples=1000,
device='cuda'
)
print(f"Samples/second: {efficiency_report['throughput']:.2f}")
print(f"Memory usage: {efficiency_report['memory_gb']:.2f} GB")
print(f"Circuit execution time: {efficiency_report['circuit_time']:.4f}s")
π Comprehensive Evaluation
Full Evaluation Suite
Run all metrics at once:
from qgans_pro.utils import ComprehensiveEvaluator
# Initialize comprehensive evaluator
evaluator = ComprehensiveEvaluator(
classical_metrics=['fid', 'is', 'lpips'],
quantum_metrics=['fidelity', 'advantage'],
fairness_metrics=['demographic_parity', 'equalized_odds'],
device='cuda'
)
# Run full evaluation
results = evaluator.evaluate(
model=quantum_gan,
real_data=real_data,
generated_data=generated_data,
sensitive_attributes=sensitive_attrs
)
# Print comprehensive report
evaluator.print_report(results)
Custom Metrics
Define your own evaluation metrics:
from qgans_pro.utils import BaseMetric
class CustomQuantumMetric(BaseMetric):
def __init__(self, threshold=0.5):
super().__init__()
self.threshold = threshold
def calculate(self, real_data, generated_data):
# Custom quantum evaluation logic
quantum_score = self._calculate_quantum_score(generated_data)
return {
'custom_score': quantum_score,
'passed_threshold': quantum_score > self.threshold
}
def _calculate_quantum_score(self, data):
# Implement your custom quantum metric
pass
# Use custom metric
custom_metric = CustomQuantumMetric(threshold=0.7)
custom_results = custom_metric(real_samples, generated_samples)
π Visualization and Reporting
Metric Visualization
Create comprehensive evaluation plots:
from qgans_pro.utils import MetricVisualizer
# Initialize visualizer
visualizer = MetricVisualizer()
# Plot metric evolution during training
visualizer.plot_training_metrics(
metrics_log=training_metrics,
save_path='metrics_evolution.png'
)
# Create comparison radar chart
visualizer.plot_metric_comparison(
models=['Quantum GAN', 'Classical GAN', 'Hybrid GAN'],
metrics=evaluation_results,
save_path='model_comparison.png'
)
# Generate correlation heatmap
visualizer.plot_metric_correlations(
metrics_dict=all_metrics,
save_path='metric_correlations.png'
)
Automated Reports
Generate detailed evaluation reports:
from qgans_pro.utils import ReportGenerator
# Create automated report
report_gen = ReportGenerator()
report = report_gen.generate_report(
model_name='Quantum Fashion-MNIST GAN',
evaluation_results=results,
training_config=config,
template='comprehensive'
)
# Save report
report_gen.save_report(report, 'evaluation_report.html')
π― Best Practices
1. Metric Selection
Choose appropriate metrics for your use case:
# For image generation
classical_metrics = ['fid', 'is', 'lpips']
quantum_metrics = ['fidelity', 'expressibility']
# For tabular data
classical_metrics = ['wasserstein_distance', 'correlation_distance']
quantum_metrics = ['quantum_advantage', 'entanglement']
# For fairness-critical applications
fairness_metrics = ['demographic_parity', 'equalized_odds', 'individual_fairness']
2. Evaluation Frequency
Balance thoroughness with computational cost:
# Quick evaluation (every epoch)
quick_metrics = ['generator_loss', 'discriminator_loss']
# Medium evaluation (every 10 epochs)
medium_metrics = ['fid', 'quantum_fidelity']
# Comprehensive evaluation (every 50 epochs)
comprehensive_metrics = ['all_classical', 'all_quantum', 'all_fairness']
3. Statistical Significance
Ensure robust evaluation:
from qgans_pro.utils import StatisticalTesting
# Test statistical significance
stat_test = StatisticalTesting()
significance_results = stat_test.compare_models(
model_a_results=quantum_results,
model_b_results=classical_results,
test='wilcoxon',
alpha=0.05
)
print(f"P-value: {significance_results['p_value']:.4f}")
print(f"Significant difference: {significance_results['significant']}")
π¨ Common Pitfalls
Avoid These Mistakes
- Single Metric Reliance: Don't rely on just FID or IS
- Insufficient Samples: Use at least 10k samples for reliable metrics
- Batch Size Effects: Use consistent batch sizes for comparison
- Hardware Inconsistency: Use same device for all evaluations
- Temporal Bias: Evaluate at multiple training checkpoints
Troubleshooting
Common issues and solutions:
# Issue: OOM during FID calculation
# Solution: Process in smaller batches
fid_metric = FIDScore(batch_size=32, device='cuda')
# Issue: Inconsistent IS scores
# Solution: Use more samples and multiple runs
is_scores = []
for _ in range(5):
score = is_metric(generated_samples)
is_scores.append(score)
final_is = np.mean(is_scores)
# Issue: Quantum metrics too slow
# Solution: Use sampling or circuit approximation
qf_metric = QuantumFidelity(sampling_shots=1000, approximate=True)
Pro Tip
Always evaluate models on held-out test data that wasn't used during training. This ensures unbiased assessment of generalization performance.
Computational Cost
Quantum metrics can be computationally expensive. Consider using approximation methods or sampling techniques for large-scale evaluations.