Translate to conditional probability
The Multivariate normal distribution is given by the formula
where:
- is the dimension of the vector,
- is the mean vector,
- is the covariance matrix, and
- denotes the determinant of .
In your case, the model is
This means that conditioned on , the random variable is distributed as a multivariate normal with:
- Mean:
- Covariance:
Substitute these into the multivariate normal formula:
- Determinant and Inverse of the Covariance:
Since (a diagonal matrix with each diagonal entry equal to ):- The determinant is
- The inverse is
- Plug into the Formula:
-
Simplify:
- The identity matrix doesn’t change the vector, so
- Thus, the exponent simplifies to
- Final expression
In summary, starting from with , the conditional probability is
Why
Answer: The Euclidean distance between two vectors and is defined as the length of the vector difference Concretely, if
then the Euclidean distance is given by
When you square this distance, you get
This is exactly the same as computing the dot product of the vector with itself:
Thus, we have
From SDE to conditional probability
The key idea is to solve the stochastic differential equation (SDE) by integrating its diffusion term and then computing the variance of the resulting stochastic integral. Here’s a step-by-step explanation:
1. Writing the SDE and Its Integral Form
You start with the SDE:
where is a standard Wiener process (Brownian motion). Integrating both sides from 0 to gives:
2. Distribution of the Stochastic Integral
The integral
is a stochastic integral with a deterministic integrand. By properties of such integrals, it is normally distributed with mean zero and covariance given by the Itô isometry:
where is the identity matrix.
3. Evaluating the Variance Integral
We need to compute the integral
This can be computed by rewriting the integrand in exponential form:
Thus,
This is an elementary integral:
4. Putting It All Together
Since
and the stochastic integral is Gaussian with mean 0 and covariance it follows that:
Thus, the transition probability density is:
Plug conditional probability into loss function
1. Gaussian Conditional Distribution and Its Score
The conditional distribution of given is a Gaussian:
where the variance is given by
For a Gaussian distribution with mean and covariance its probability density function is:
Taking the logarithm, we get:
The score function is defined as the gradient of the log-density with respect to :
In our setting:
- The “mean” is
- The variance is
Thus, the score (the gradient of the log-density) becomes:
in other words:
2. Plugging into the Loss Function
The loss function is defined as:
Substituting the score:
3. Complete loss function
Consider the loss
we know that magnitude (scale) of true score varies according to time since variance increases with time .
This means the model must predict the correct scale for each time to balance the loss across all time steps. However, the model will likely struggle with this, so we need to assist it in finding the correct scale for each . To do that:
-
The “typical” difference is roughly since
that means typical size (or scale) is . In other words, although individual samples vary, most values of are on the order of . “On the order of” is a shorthand for saying “approximately proportional to” or “roughly of the same scale as.”
-
The “typical” magnitude of true score: Since the score is
its magnitude is roughly
Given that is typically on the order of , the typical magnitude of the true score becomes approximately
that means on average at each time , it will have magnitude (scale) equal to
-
We simply divide the output of the model by then we will have match score.
Lastly, we choose the weighting function
to avoid division (reason here), so our loss now:
therefore
How to compute step size for PC
The step size is chosen adaptively to balance the contribution of the score (signal) and the injected noise in the Langevin MCMC update. Recall that the Langevin update is:
where:
- is the score (i.e., an estimate of
- is the step size, and
- is standard Gaussian noise.
The goal is to set such that the update from the score is proportional to the noise level, scaled by a desired signal-to-noise ratio (SNR). That is, we want the magnitude of the score update to be times the magnitude of the noise update.
-
Magnitude of the Score Update:
The change due to the score is approximatelywhere is the norm of the score.
-
Magnitude of the Noise Update:
The noise term has a typical magnitude ofwhere is an estimate of the norm of a standard Gaussian noise vector (compute from of this is the typical norm of a standard Gaussian vector in that space)
-
Balancing the Two Terms:
To enforce a desired signal-to-noise ratio (), we set: -
Solving for :
Rearranging, we have:Squaring both sides gives:
So the step size is adaptively set to ensure that the update from the score is times the size of the noise update, balancing the two contributions in the Langevin MCMC step. This adaptive choice helps maintain stability and improves the quality of the refined sample during the corrector step.