Skip to content

Diffusion Models

This section covers thermodynamic diffusion models, which leverage principles of thermal diffusion and stochastic processes to generate samples and solve inverse problems.

Overview

Thermodynamic diffusion models extend traditional diffusion models by incorporating explicit thermodynamic state variables and physical constraints. These models can generate samples that evolve according to realistic physical processes while maintaining thermodynamic consistency.

Theoretical Foundation

Forward Diffusion Process

The forward process gradually adds noise according to a diffusion schedule: \(\(q(\mathbf{x}_t | \mathbf{x}_{t-1}) = \mathcal{N}(\mathbf{x}_t; \sqrt{1-\beta_t}\mathbf{x}_{t-1}, \beta_t \mathbf{I})\)\)

Where \(\beta_t\) is the noise schedule.

Thermodynamic Interpretation

In thermodynamic terms:

  • Energy: \(U_t = \|\mathbf{x}_t\|^2 / 2\)
  • Temperature: \(T_t = \beta_t / 2\)
  • Entropy: \(S_t = \frac{d}{2}\log(2\pi e T_t)\)
  • Free Energy: \(F_t = U_t - T_t S_t\)

Score Function

The score function represents the gradient of log-density: \(\(s_\theta(\mathbf{x}_t, t) = \nabla_{\mathbf{x}_t} \log p_t(\mathbf{x}_t)\)\)

Thermodynamic Score Models

Energy-Based Score

Define score in terms of energy: \(\(s_\theta(\mathbf{x}, t) = -\frac{1}{T_t}\nabla_{\mathbf{x}} U_\theta(\mathbf{x}, t)\)\)

Where \(U_\theta\) is a learned energy function.

Temperature-Dependent Score

Score function with explicit temperature dependence: \(\(s_\theta(\mathbf{x}, t) = -\frac{1}{T_t}\nabla_{\mathbf{x}} U_\theta(\mathbf{x}, t) + \sqrt{\frac{2}{T_t}}\boldsymbol{\xi}\)\)

Where \(\boldsymbol{\xi}\) represents thermal fluctuations.

Implementation

class ThermodynamicScoreModel(nn.Module):
    def __init__(self, dim, hidden_dim=256, n_layers=4):
        super().__init__()
        self.energy_net = EnergyNetwork(dim, hidden_dim, n_layers)
        self.temperature_schedule = self.get_temperature_schedule()

    def energy(self, x, t):
        """Compute energy U(x,t)"""
        return self.energy_net(x, t)

    def score(self, x, t):
        """Compute thermodynamic score"""
        x.requires_grad_(True)
        energy = self.energy(x, t)
        score = -torch.autograd.grad(
            energy.sum(), x, create_graph=True
        )[0]

        temperature = self.get_temperature(t)
        return score / temperature

    def get_temperature(self, t):
        """Get temperature at time t"""
        return self.temperature_schedule(t)

Reverse Diffusion Process

Thermodynamic Reverse SDE

The reverse-time SDE with thermodynamic interpretation: \(\(d\mathbf{x} = \left[\mathbf{f}(\mathbf{x}, t) - g(t)^2 s_\theta(\mathbf{x}, t)\right]dt + g(t)d\bar{\mathbf{w}}\)\)

Where:

  • \(\mathbf{f}(\mathbf{x}, t)\) is drift coefficient
  • \(g(t)\) is diffusion coefficient
  • \(s_\theta(\mathbf{x}, t)\) is learned score function
  • \(d\bar{\mathbf{w}}\) is reverse Wiener process

Heat Equation Connection

The reverse process satisfies a modified heat equation: \(\(\frac{\partial p}{\partial t} = \nabla \cdot \left(D(t) \nabla p + D(t) p \nabla \log p_t\right)\)\)

Where \(D(t) = g(t)^2/2\) is diffusion coefficient.

Langevin Dynamics

Discrete sampling via Langevin MCMC: \(\(\mathbf{x}_{i+1} = \mathbf{x}_i + \epsilon s_\theta(\mathbf{x}_i, t) + \sqrt{2\epsilon T_t}\boldsymbol{\xi}\)\)

Training Objectives

Score Matching

Minimize score matching loss: \(\(\mathcal{L}_{\text{SM}} = \mathbb{E}_{t,\mathbf{x}_0,\boldsymbol{\epsilon}}\left[\left\|s_\theta(\mathbf{x}_t, t) - s_t(\mathbf{x}_t)\right\|^2\right]\)\)

Where \(s_t(\mathbf{x}_t)\) is the true score.

Denoising Score Matching

Simplified objective using noise prediction: \(\(\mathcal{L}_{\text{DSM}} = \mathbb{E}_{t,\mathbf{x}_0,\boldsymbol{\epsilon}}\left[\left\|\epsilon_\theta(\mathbf{x}_t, t) - \boldsymbol{\epsilon}\right\|^2\right]\)\)

Thermodynamic Consistency Loss

Additional term enforcing thermodynamic relations: \(\(\mathcal{L}_{\text{thermo}} = \mathbb{E}\left[\left|U + TS - F\right|^2 + \left|\frac{\partial F}{\partial T} + S\right|^2\right]\)\)

Energy Conservation

Penalize energy violations: \(\(\mathcal{L}_{\text{energy}} = \mathbb{E}\left[\left|\frac{dE}{dt} - P_{\text{input}} + P_{\text{dissipation}}\right|^2\right]\)\)

Specialized Architectures

Energy-Based Networks

Networks that explicitly output energy: \(\(U_\theta(\mathbf{x}, t) = \text{EnergyNet}(\mathbf{x}, t)\)\)

Common architectures:

  • ResNet-based energy networks
  • Transformer energy models
  • Graph neural networks for molecular systems

Temperature-Adaptive Networks

Networks with learnable temperature schedules: \(\(T_\theta(t) = \text{TempNet}(t)\)\)

Multi-Scale Models

Hierarchical models for different length scales: \(\(U_{\text{total}} = U_{\text{atomic}} + U_{\text{molecular}} + U_{\text{system}}\)\)

Sampling Methods

Euler-Maruyama Scheme

Basic numerical integration: \(\(\mathbf{x}_{i+1} = \mathbf{x}_i + h \mathbf{f}(\mathbf{x}_i, t_i) + \sqrt{h} g(t_i) \boldsymbol{\xi}_i\)\)

Heun's Method

Higher-order accuracy: \(\(\tilde{\mathbf{x}}_{i+1} = \mathbf{x}_i + h \mathbf{f}(\mathbf{x}_i, t_i) + \sqrt{h} g(t_i) \boldsymbol{\xi}_i\)\) \(\(\mathbf{x}_{i+1} = \mathbf{x}_i + \frac{h}{2}[\mathbf{f}(\mathbf{x}_i, t_i) + \mathbf{f}(\tilde{\mathbf{x}}_{i+1}, t_{i+1})] + \sqrt{h} g(t_i) \boldsymbol{\xi}_i\)\)

Predictor-Corrector

Combine prediction and correction steps:

  1. Predictor: Standard Euler step
  2. Corrector: Langevin MCMC refinement

Adaptive Sampling

Adjust step size based on local dynamics: \(\(h_{i+1} = h_i \cdot \text{adapt\_factor}(\|\mathbf{f}(\mathbf{x}_i, t_i)\|, \text{error\_estimate})\)\)

Temperature Schedules

Linear Schedule

\[T_t = T_{\text{start}} \frac{T_{\text{end}} - t}{T_{\text{end}} - T_{\text{start}}}\]

Exponential Schedule

\[T_t = T_{\text{start}} \exp\left(-\frac{t}{\tau}\right)\]

Cosine Schedule

\[T_t = T_{\text{end}} + \frac{T_{\text{start}} - T_{\text{end}}}{2}\left(1 + \cos\left(\frac{\pi t}{T}\right)\right)\]

Learned Schedule

\[T_t = \text{ScheduleNet}(t, \text{problem\_features})\]

Physical Constraints

Conservation Laws

Enforce conservation during generation:

Energy Conservation: \(\(\sum_i E_i = \text{constant}\)\)

Momentum Conservation: \(\(\sum_i m_i \mathbf{v}_i = \text{constant}\)\)

Mass Conservation: \(\(\sum_i m_i = \text{constant}\)\)

Symmetries

Respect physical symmetries:

Translation Invariance: \(\(U(\mathbf{x} + \mathbf{a}) = U(\mathbf{x})\)\)

Rotation Invariance: \(\(U(R\mathbf{x}) = U(\mathbf{x})\)\)

Permutation Invariance: \(\(U(P\mathbf{x}) = U(\mathbf{x})\)\)

Boundary Conditions

Handle different boundary conditions:

Periodic Boundaries: \(\(\mathbf{x}(L) = \mathbf{x}(0)\)\)

Reflecting Boundaries: \(\(\mathbf{v} \cdot \mathbf{n} = 0\)\) at boundaries

Absorbing Boundaries: \(\(p(\mathbf{x}) = 0\)\) at boundaries

Multi-Modal Generation

Mixture Models

Generate from multiple modes: \(\(p(\mathbf{x}) = \sum_k \pi_k p_k(\mathbf{x})\)\)

Each mode has its own energy function: \(\(U_k(\mathbf{x}) = U_{\text{base}}(\mathbf{x}) + V_k(\mathbf{x})\)\)

Mode Switching

Allow transitions between modes during generation: \(\(P(k \to j) = \exp\left(-\frac{U_j - U_k}{k_B T}\right)\)\)

Hierarchical Generation

Generate at multiple scales:

  1. Global structure
  2. Local details
  3. Fine-scale features

Conditional Generation

Conditional Score Models

Score function conditioned on context: \(\(s_\theta(\mathbf{x}, t | \mathbf{c}) = \nabla_{\mathbf{x}} \log p_t(\mathbf{x} | \mathbf{c})\)\)

Classifier Guidance

Use external classifier for guidance: \(\(\tilde{s}_\theta(\mathbf{x}, t) = s_\theta(\mathbf{x}, t) + w \nabla_{\mathbf{x}} \log p_{\phi}(y | \mathbf{x})\)\)

Classifier-Free Guidance

Self-contained conditional generation: \(\(\tilde{s}_\theta(\mathbf{x}, t) = s_\theta(\mathbf{x}, t | \mathbf{c}) + w(s_\theta(\mathbf{x}, t | \mathbf{c}) - s_\theta(\mathbf{x}, t))\)\)

Applications

Molecular Dynamics

Generate molecular configurations:

  • Protein folding trajectories
  • Chemical reaction pathways
  • Drug design and optimization

Material Design

Generate new materials:

  • Crystal structures
  • Polymer configurations
  • Nanoparticle assemblies

Climate Modeling

Generate weather patterns:

  • Temperature distributions
  • Precipitation patterns
  • Extreme event simulations

Fluid Dynamics

Generate flow fields:

  • Turbulent flows
  • Heat transfer patterns
  • Multiphase flows

Advanced Techniques

Neural ODEs for Diffusion

Use neural ODEs for continuous-time modeling: \(\(\frac{d\mathbf{x}}{dt} = f_\theta(\mathbf{x}, t)\)\)

Stochastic Interpolants

Learn paths between distributions: \(\(\mathbf{x}_t = (1-t)\mathbf{x}_0 + t\mathbf{x}_1 + \sigma_t \boldsymbol{\epsilon}\)\)

Flow Matching

Match vector fields instead of scores: \(\(\mathcal{L}_{\text{FM}} = \mathbb{E}_{t,\mathbf{x}_0,\mathbf{x}_1}\left[\left\|v_\theta(\mathbf{x}_t, t) - u_t(\mathbf{x}_t)\right\|^2\right]\)\)

Evaluation Metrics

Thermodynamic Consistency

Check thermodynamic relations:

  • \(dU = TdS - PdV\)
  • \(G = H - TS\)
  • Maxwell relations

Sample Quality

Standard generative model metrics:

  • Fréchet Inception Distance (FID)
  • Inception Score (IS)
  • Kernel Inception Distance (KID)

Physical Realism

Domain-specific validation:

  • Energy conservation
  • Force consistency
  • Stability analysis

Computational Considerations

Memory Optimization

Techniques for large-scale generation:

  • Gradient checkpointing
  • Mixed precision training
  • Model parallelism

Acceleration Methods

Speed up sampling:

  • Distillation models
  • Deterministic sampling
  • Few-step generation

Hardware Optimization

Efficient implementation:

  • GPU optimization
  • TPU acceleration
  • Distributed sampling

Future Directions

Quantum Diffusion Models

Extension to quantum systems: \(\(\frac{\partial \rho}{\partial t} = -\frac{i}{\hbar}[H, \rho] + \mathcal{L}[\rho]\)\)

Non-Equilibrium Diffusion

Models for driven systems: \(\(\frac{d\mathbf{x}}{dt} = -\nabla U(\mathbf{x}) + \mathbf{F}_{\text{drive}} + \boldsymbol{\xi}\)\)

Adaptive Neural Architectures

Networks that adapt during generation: \(\(\theta_{t+1} = \theta_t + \Delta\theta(\mathbf{x}_t, t)\)\)

Conclusion

Thermodynamic diffusion models provide a powerful framework for generating samples that respect physical principles and constraints. By incorporating explicit thermodynamic variables and conservation laws, these models can generate realistic and physically consistent samples across a wide range of applications, from molecular systems to climate modeling.