Translate to conditional probability

The Multivariate normal distribution is given by the formula

where:

  • is the dimension of the vector,
  • is the mean vector,
  • is the covariance matrix, and
  • denotes the determinant of .

In your case, the model is

This means that conditioned on , the random variable is distributed as a multivariate normal with:

  • Mean:
  • Covariance:

Substitute these into the multivariate normal formula:

  1. Determinant and Inverse of the Covariance:
    Since (a diagonal matrix with each diagonal entry equal to ):
    • The determinant is
    • The inverse is
  2. Plug into the Formula:
  1. Simplify:

    1. The identity matrix doesn’t change the vector, so
    1. Thus, the exponent simplifies to
    1. Final expression

In summary, starting from with , the conditional probability is

Why

Answer: The Euclidean distance between two vectors and is defined as the length of the vector difference Concretely, if

then the Euclidean distance is given by

When you square this distance, you get

This is exactly the same as computing the dot product of the vector with itself:

Thus, we have

From SDE to conditional probability

The key idea is to solve the stochastic differential equation (SDE) by integrating its diffusion term and then computing the variance of the resulting stochastic integral. Here’s a step-by-step explanation:

1. Writing the SDE and Its Integral Form

You start with the SDE:

where is a standard Wiener process (Brownian motion). Integrating both sides from 0 to gives:

2. Distribution of the Stochastic Integral

The integral

is a stochastic integral with a deterministic integrand. By properties of such integrals, it is normally distributed with mean zero and covariance given by the Itô isometry:

where is the identity matrix.

3. Evaluating the Variance Integral

We need to compute the integral

This can be computed by rewriting the integrand in exponential form:

Thus,

This is an elementary integral:

4. Putting It All Together

Since

and the stochastic integral is Gaussian with mean 0 and covariance it follows that:

Thus, the transition probability density is:

Plug conditional probability into loss function

1. Gaussian Conditional Distribution and Its Score

The conditional distribution of given is a Gaussian:

where the variance is given by

For a Gaussian distribution with mean and covariance its probability density function is:

Taking the logarithm, we get:

The score function is defined as the gradient of the log-density with respect to :

In our setting:

  • The “mean” is
  • The variance is

Thus, the score (the gradient of the log-density) becomes:

in other words:

2. Plugging into the Loss Function

The loss function is defined as:

Substituting the score:

3. Complete loss function

Consider the loss

we know that magnitude (scale) of true score varies according to time since variance increases with time .

This means the model must predict the correct scale for each time to balance the loss across all time steps. However, the model will likely struggle with this, so we need to assist it in finding the correct scale for each . To do that:

  1. The “typical” difference is roughly since

    that means typical size (or scale) is . In other words, although individual samples vary, most values of are on the order of . “On the order of” is a shorthand for saying “approximately proportional to” or “roughly of the same scale as.”

  2. The “typical” magnitude of true score: Since the score is

    its magnitude is roughly

    Given that is typically on the order of , the typical magnitude of the true score becomes approximately

    that means on average at each time , it will have magnitude (scale) equal to

  3. We simply divide the output of the model by then we will have match score.

    Lastly, we choose the weighting function

    to avoid division (reason here), so our loss now:

    therefore

How to compute step size for PC

The step size is chosen adaptively to balance the contribution of the score (signal) and the injected noise in the Langevin MCMC update. Recall that the Langevin update is:

where:

  • is the score (i.e., an estimate of
  • is the step size, and
  • is standard Gaussian noise.

The goal is to set such that the update from the score is proportional to the noise level, scaled by a desired signal-to-noise ratio (SNR). That is, we want the magnitude of the score update to be times the magnitude of the noise update.

  1. Magnitude of the Score Update:
    The change due to the score is approximately

    where is the norm of the score.

  2. Magnitude of the Noise Update:
    The noise term has a typical magnitude of

    where is an estimate of the norm of a standard Gaussian noise vector (compute from of this is the typical norm of a standard Gaussian vector in that space)

  3. Balancing the Two Terms:
    To enforce a desired signal-to-noise ratio (), we set:

  4. Solving for :
    Rearranging, we have:

    Squaring both sides gives:

So the step size is adaptively set to ensure that the update from the score is times the size of the noise update, balancing the two contributions in the Langevin MCMC step. This adaptive choice helps maintain stability and improves the quality of the refined sample during the corrector step.