Publ. Mat. 53 (2009), 3–45 STATISTICAL INFERENCE FOR STOCHASTIC PARABOLIC EQUATIONS: A SPECTRAL APPROACH

A parameter estimation problem is considered for a stochastic par- abolic equation driven by additive Gaussian noise that is white in time and space. The estimator is of spectral type and utilizes a finite number of the spatial Fourier coefficients of the solution. The asymptotic properties of the estimator are studied as the number of the Fourier coefficients increases, while the observation time and the noise intensity are fixed. A necessary and sufficient condition for consistency and asymptotic normality of the estimator is derived in terms of the eigenvalues of the operators in the equation, and a detailed proof is provided. Other estimation problems are briefly surveyed.

1. Introduction 1.1.Motivation: The one-dimensional stochastic heat equation.Consider the following stochastic equation (1.1) du(t, x) = θ u xx (t, x) dt + dW (t, x), 0 < t ≤ T, x ∈ (0, π), with zero initial and boundary conditions, where θ > 0 is an unknown real number and dW (t, x) is the noise term.With precise definitions to come later, at this point we interpret dW as a formal sum dW (t, x) = k≥1 h k (x) dw k (t), where h k (x) = 2/π sin(kx), k ≥ 1, and w k are independent standard Brownian motions.Let us look for the solution of (1.1) as a Fourier series u(t, x) = k≥1 u k (t)h k (x).
Substitution of this series in (1.1) suggests that each u k should satisfy with initial condition u k (0) = 0.If the trajectory of u k (t) is observed for one fixed k and all 0 < t < T , then the maximum likelihood estimator of θ based on this observation is see, for example, Liptser and Shiryaev [47,Formulas 17.25 and 17.45].
It is known [47,Theorem 17.4] that this estimator is consistent in the limit T → +∞: lim with probability one.
Let us now assume that the trajectories of u k (t) are observed for all 0 < t < T and all k = 1, . . ., N , and let us combine the estimators (1.3) for different k as follows: (1.4) First suggested by Huebner, Khas'minskiȋ, and Rozovskiȋ in [26], (1.4) is, in fact, the maximum likelihood estimator of θ based on the observations u k (t), k = 1, . . ., N , 0 < t < T .It follows from (1.2) and (1.4) that Note that both the top and the bottom of the fraction on the right-hand side of (1.5) are sums of independent random variables, and the analysis of the properties of the estimator θ N is thus reduced to the study of these sums.By direct computation, Consequently, as N → ∞, where notation a N ∼ b N means lim N →∞ (a N /b N ) = 1.Since E T 0 k 2 u k (t) dw k (t) = 0, it is reasonable to conjecture that, • by the law of large numbers, lim N →∞ ( θ N − θ) = 0 with probability one; • by the central limit theorem, the sequence of random variables {N 3/2 ( θ N − θ), N ≥ 1} converges in distribution to a zero-mean Gaussian random variable.
It is also clear that the proof of these conjectures will require a closer look at the Ornstein-Uhlenbeck processes (1.2) (see Section 2.1).
In the rest of the introduction, we discuss how (1.1) fits in the general framework of statistical estimation.
1.2.Statistical estimation.In many models, the general form of the equation is given by the basic laws governing the underlying process, while the particular features of the equation, such as coefficients, initial or boundary conditions, etc., must be determined from the observations of the process.This model validation is often accomplished with the help of statistical estimation.
Stochastic parabolic equations are used in various economical and physical models, such as the term structure of interest rates for bonds with different maturities (Aihara and Bagchi [8], [9], Cont [16]), the temperature of the top layer of the ocean (Frankignoul [20], Piterbarg and Rozovskiȋ [57]), evolution of the population in time and space (Dawson [17], De [18]), spread of pollutants (Serrano and Adomian [70], Serrano and Unny [71]), etc. Equations of the type (1.1) provide a useful toy model for understanding the possible effect of the infinite number of dimensions and for deriving the bench-mark results about the corresponding estimators.Diagonalizable stochastic parabolic equations of the type discussed below can also model statistical problems in which information is coming from many independent, but not identical, channels (Korostelev and Yin [38]).
In the classical statistical estimation problem, the starting point is a family P θ of probability measures depending on the parameter θ ∈ Θ ⊂ R. Each P θ is the distribution of a random element.It is assumed that a realization of one random element corresponding to one particular value of θ of the parameter is observed, and the objective is to estimate the value of this parameter from the observations.One approach is to select the value θ corresponding to the random element that is most likely to produce the observations.More precisely, we say that the statistical model (or estimation problem) P θ , θ ∈ Θ, is absolutely continuous if there exists a probability measure Q such that all measures P θ are absolutely continuous with respect to Q. Then the maximum likelihood estimator θ of the unknown parameter is constructed by maximizing with respect to θ the density dP θ /dQ.As a rule, θ = θ, but one can hope that θ approaches θ as more and more information becomes available.The amount of information can be increased in one of two ways: (a) increasing the sample size, for example, the observation time interval (large sample asymptotic); (b) reducing the amplitude of noise (small noise asymptotic).
If the measures P θ are mutually singular for different values of θ, then the model is called singular, and the value of the parameter can often be determined "exactly".In reality, a singular model is usually approximated by a sequence of absolutely continuous models, and the parameter is then computed as the limit of the corresponding maximum likelihood estimators.For parabolic equations driven by additive space-time white noise, this approach was first suggested by Huebner, Khas'minskiȋ, and Rozovskiȋ [26], and was further investigated by Huebner and Rozovskiȋ [30], where a necessary and sufficient condition for the convergence of the estimators was stated in terms of the orders of the operators in the equation.
When the observations are finite-dimensional diffusions, the necessary and sufficient conditions for absolute continuity of the corresponding measures are well-known (see, for example, Liptser and Shiryaev [46,Chapter 7]).Many of the results have been extended to infinite dimensions by Kozlov [40], [41], Loges [39], [48], Mikulevicius and Rozovskiȋ [54], [55] and others.For linear equations, such as (1.1), whose solutions are Gaussian processes, there is another useful result, originally discovered independently by Feldman [19] and Hájek [21], [22]: Two Gaussian measures are either mutually absolutely continuous or mutually singular.In particular, we will see later (Theorem 4.8) that the measures generated by the solutions of (1.1) in a suitable Hilbert space are mutually singular for different values of θ, and this singularity allows us to get the exact value of the parameter θ, corresponding to the observations, as Since both limits are infinite, the expression must be approximated by θ N from (1.4).The situation is somewhat similar to the problem of estimating the diffusion coefficient in a finite-dimensional diffusion, where the exact value is known from the quadratic variation but is computed approximately using time discretization (see (2.27) and (2.28) below).
Here is the main result of the paper.Let {h k , k ≥ 1} be an orthonormal basis in a Hilbert space H and let W (t) = k≥1 w k (t)h k be a cylindrical Brownian motion on H. Consider the linear stochastic parabolic equation (1.6) du Assume that the operators A 0 and A 1 have a common system of eigenfunctions: ) is necessary and sufficient to have consistency and asymptotic normality of θ N .In particular, if the series k ν 2 k /(ρ k + θ ν k ) diverges, then lim and the measures generated by the solutions of equation (1.6) are mutually singular for different values of θ.
For the convenience of the reader, the following section summarizes the main notions and technical tools necessary to study the estimation problem for stochastic parabolic equation and to prove the above theorem: Ornstein-Uhlenbeck process and its properties, Law of Large Numbers and the Central Limit Theorem for independent but not identically distributed random variables, and the cylindrical Brownian motion.Section 2.4 summarizes the main facts and presents some examples related to absolutely continuous and singular statistical models; the book by Ibragimov and Khas'minskiȋ [31] provides more information on the subject.Section 3 illustrates the main steps of the proof of Theorem 1.1 in the particular case of the stochastic heat equation (1.1); Theorem 1.1 itself is proved in Section 4. Finally, Section 5 discusses other statistical estimation problems for stochastic parabolic equations.

Notations
Throughout the presentation below, we fix a stochastic basis F = (Ω, F , {F t } t≥0 , P) with the usual assumptions (completeness of F 0 and right-continuity of F t ).We also assume that F is large enough to support countably many independent standard Brownian motions.For a random variable ξ, Eξ and Var ξ denote the expectation and the variance, respectively.R n is an n-dimensional Euclidean space, N (m, σ 2 ) is a Gaussian random variable with mean m and variance σ 2 , B ⊤ is the adjoint of the operator B. Notation For example, n 2 − 2n ∼ n 2 and n k=1 k 2 ∼ n 3 /3.
2. Some background from probability and statistics e −a(t−s) dw(s).
(1) Define the random variable Then (2) Denote by P a T the measure generated by the process X(t; a), 0 ≤ t ≤ T , in the space of continuous functions on [0, T ].Then the measures P a T are equivalent (mutually absolutely continuous) for all a and In particular, P 0 T is the Wiener measure (the measure generated by the standard Brownian motion), and Proof: (1) Everything is proved by direct computation.For (2.3), the computations are easy: (2.8) ), and the result follows.
For (2.4), we use • the fact that X(t; a) is a zero-mean Gaussian random variable, which implies (2.10) Then where the last inequality follows from (2.10) and (2.8).
For (2.5), it is necessary to find E(ξ(a)) 2 , and the computations are more complicated.Here are two possible ways to approach the computations.where ̺(y; a) = (a 2 + 2y) 1/2 ; see Liptser and Shiryaev [47,Lemma 17.3].Then and both differentiation and evaluation of the limit can be carried out with the help of a computer algebra system.The details are left to the reader (see also Cialenco at al. [15]).
2. Alternatively, it follows from the definition of ξ(a) that and, for each t > s, the random variables X(t; a), X(s; a) are jointly Gaussian with zero mean and correlation coefficient Note that if α, β are jointly Gaussian, with zero mean, unit variance, and correlation ρ, then As a result, and, by (2.13), where lim a→+∞ a 3 o(a −3 ) = 0.This implies (2.5).
It is important to keep in mind that while the density dP b T /dP a T is a functional defined on all continuous functions, it has nice closed-form expressions only when evaluated on X(•; b) or X(•; a); each of these expression defines a random variable on the original probability space Ω. Clearly, Note also that (2.6) and (2.1) imply and so Finally, let us point out that (2.7) is consistent with the Gisanov Theorem.Indeed, if and P is the probability measure on (Ω, F T ) such that d P = Z(a) dP, then, by the Girsanov Theorem [46, Theorem 6.3], X(t; a), 0 ≤ t ≤ T , is a standard Brownian motion under P.In particular, This completes the proof of Theorem 2.1.

LLN and CLT.
The proof of consistency and asymptotic normality of (1.4) and similar estimators relies on the Law of Large Numbers (LLN) and the Central Limit Theorem (CLT) for random variables that are independent but not identically distributed.Corollary 2.3.Let ξ n , n ≥ 1, be independent random variables such that ξ n ≥ 0, k≥1 Eξ k = +∞, and Then with probability one.
Eξ k and apply Theorem 2.2.Theorem 2.4 (Classical Central Limit Theorem).Assume that ξ n , n ≥ 1, are independent random variables with zero mean and variance σ 2 n > 0, and assume that (2.14) tion to the Gaussian random variable with zero mean and unit variance:

1).
Proof: To simplify the notations, define We have to verify the classical condition of Lindeberg [72, Theorem III.4.1]: (2. 16) for every ε > 0, where I(A) is the indicator function of the set A. We have where the first inequality follows from the Cauchy-Schwarz and the second, from the Chebychev inequality.By (2.14) and (2.15), the convergence in (2.16) will follow from (2.17) lim Another useful versions of the CLT comes from the theory of martingales.
Proof: This follows from a limit theorem for martingales: if X n , X are continuous square-integrable martingales such that X is a Gaussian process and lim n→∞ X n (T ) = X (T ) in probability, then lim n→∞ X n (T ) = X(T ) in distribution; see, for example Jacod and Shiryaev [36, Theorem VIII.4.17] or Liptser and Shiryaev [45, Theorem 5.5.4(II)].It now remains to take where w is a standard Brownian motion.
Corollary 2.6.Let w k = w k (t) be independent standard Brownian motions and let f k = f k (t) be adapted, continuous, square-integrable processes such that in probability.Then Proof: This follows from Theorem 2.5 with Proposition 2.8.Let W be a cylindrical Brownian motion on a Hilbert space H.
Proof: (a) By direct computation, for a, b ∈ R and f, g ∈ H, (b) By definition of W , the pair (w 1 , w 2 ) is a zero-mean Gaussian process, and Ew i (t)w j (s) = min(t, s)(h i , h j ) H , which completes the proof.
H dt < ∞), then we define the stochastic integral Direct computations show that W is a continuous X-valued squareintegrable martingale, and There are many other spaces in which W becomes a continuous squareintegrable martingale: it is enough to replace k −2 in (2.21) with k −β with β > 1.
If W is a cylindrical Brownian motion on L 2 ((0, +∞)), and χ x is the indicator function of the interval [0, x], then, by direct computation, W (t, x) = W χx (t) is a Brownian sheet and, for every f ∈ L 2 ((0, +∞)), If Q has a complete orthonormal system of eigenfunctions {h k , k ≥ 1} and Qh k = q k h k , q k > 0, then (2.20) becomes Recall that (a) an operator B from a separable Hilbert space H to a separable Hilbert space Proposition 2.9.(a) Let W be a cylindrical Brownian motion on a separable Hilbert space H and X, a Hilbert space such that H is a dense sub-set of X.Then W is a continuous X-valued square-integrable martingale if and only if the embedding j : H → X is a Hilbert-Schmidt operator; in this case, W naturally extends to a Q-cylindrical Brownian motion on X with Q = jj ⊤ , and Q is trace class.
(b) Let W Q be a Q-cylindrical Brownian motion on a separable Hilbert space H. Then W Q is a continuous H-valued square-integrable martingale if and only if the operator Q is trace class.
Proof: Below is an outline of the proof; the details are left to the reader.
(a) We have and the series converges if and only if j is Hilbert-Schmidt.The continuity of W then follows by the Kolmogorov criterion (Kunita [42, Theorem 1.4.1]).Finally, for f ∈ X, we set the adjoint operator j ⊤ is defined on all of X (see, for example, Yosida [75, Theorem VII.2]).Note that (Qh

Statistical models.
A statistical model (or experiment) generated by random elements X(θ) is a collection P = {X , X, P θ , θ ∈ Θ}, where each P θ is a probability measures on a measurable space (X , X) such that P θ (A) = P(X(θ) ∈ A), A ∈ X.In the parametric models, Θ is a subset of a finite-dimensional Euclidean space.An estimator of θ is a random variable Ψ(X(θ)), where Ψ is a measurable mapping from X to Θ.The corresponding estimate of θ is the number Ψ(X • (θ * )), where X • (θ) is the observed realization of the random element X(θ).
In general, an estimate of θ, being a realization of a random variable, is not equal to θ. Accordingly, a family {P N , N > 0} of statistical models is introduced, with N characterizing the amount of information about θ (the larger N , the more information is available to the observer).For example, P N can be a product of N independent copies of P, which corresponds to observing N independent realizations of X.
Given P N , the corresponding family of estimators is then constructed and studied in the limit N → ∞.One of the objectives is to establish consistency of the estimators (convergence to the true value of the parameter) as N → ∞.
In absolutely continuous statistical models, maximum likelihood estimators are often used.Definition 2.10.A statistical model P is called absolutely continuous if there exists a probability measure Q on (X , X) such that every P θ is absolutely continuous with respect to Q.The statistical model P is called singular if the measures P θ1 and P θ2 are mutually singular for θ 1 = θ 2 .
Let P be an absolutely continuous model and consider the density The maximum likelihood estimator θ of θ is defined by , where Θ is the closure of Θ.Similarly, a collection {P N , N > 0} of absolutely continuous statistical models leads to a collection θ N of maximum likelihood estimators.The parameter N does not have to be discrete.For example, consider a family of Ornstein-Uhlenbeck processes X = X(t; θ) defined by (2.24) dX(t; θ) = θX(t; θ) dt + σ dw(t), 0 < t < T, with known σ > 0. For every fixed σ and T , we get an absolutely continuous statistical model in which X is the set of continuous realvalued functions on [0, T ], X is the Borel sigma-algebra on X , and Q is the Wiener measure (the measure on (X , X) generated by the Brownian motion w); see Theorem 2.1(2).Then see Liptser and Shiryaev [47,Formula 17.45].There are at least three ways to achieve consistency: (1) Keeping T and σ fixed, consider N independent copies X k of X.Then (2) Keeping σ fixed, let T → ∞ in (2.25), so that N = T (large time asymptotic).(3) Keeping T fixed and assuming X(0; θ) = 0, let σ → 0, so that N = 1/σ (small noise asymptotic).
It is also clear that the same three methods can be used to achieve consistency in any absolutely continuous model generated by a stochastic evolution equation.For a detailed analysis of the models generated by stochastic ordinary differential equations, see the books [43], [44]  In general, given a collection {P N , N > 0} of absolutely continuous statistical models, the asymptotic properties of the maximum likelihood estimator θ N , such as consistency and asymptotic normality, can be described in terms of the properties of the corresponding densities; see, for example, Ibragimov and Khas'minskiȋ [31,Theorem III.1.1].
If the statistical model P is singular, then it is often possible (at leat in theory) to get the true value of the parameter without introducing the family P N .A well-known example is estimation of the diffusion coefficient θ from the observations of (2.26) where w = w(t) is a standard Brownian motion: since X is a squareintegrable martingale with quadratic variation X (t) = θ 2 t, it follows that (2.27) θ = X (T ) T for every T > 0. Note that, since the quadratic variation is not available directly, a computable form of (2.27) is .
Similar ideas can be used to estimate the diffusion coefficient in more general Itô equations dX = b(t, X(t)) dt + θσ(t, X(t)) dw(t).For more details, see the survey by Aït-Sahalia [11] and references therein.
Another example of a singular model is in the paper by Khas'minskiȋ et al. [37]: if the observations are a two-dimensional diffusion process (X, Y ) with with a standard Brownian motion w = w(t) and initial conditions X(0) = 0, Y (0) = 1, then, by direct computations, note that the special choice of the initial conditions is essential.
We will see in Section 4 that stochastic parabolic equations give rise to a large class of singular models.
A systematic way to study a singular model is to approximate it with a family of absolutely continuous models.For (2.26), this family comes from the time discretizations (2.28), and for many stochastic parabolic equations, from the space discretization.We illustrate this idea in the next section using the stochastic heat equation on the interval.

Analysis of the stochastic heat equation on the interval
Let W = W (t) be a cylindrical Brownian motion on L 2 ((0, π)) and θ > 0. Consider the following stochastic heat equation with zero initial and boundary conditions, where θ > 0 is a real number.
To simplify the notations, we do not indicate explicitly the dependence of u on θ.
Uniqueness of solution of (3.1) follows from the uniqueness of solution of (3.4) for every k.Proposition 3.2 is proved.
Note that since the Brownian motions w k , k ≥ 1, are independent, the Ornstein-Uhlenbeck processes u k , k ≥ 1, are also independent.
Let us now consider the problem of estimating the number θ from the observations of the solution of (3.2).One can show that the solution generates a Gaussian measure in space of continuous processes with values in a suitable Hilbert space, and the measures are singular for different values of θ (see Theorem 4.8 below).Using the terminology of Section 2.4, we have a singular statistical model, and we will approximate it with a sequence of absolutely continuous models by discretizing the space.
Assume that the observations of u k (t) are available for t ∈ [0, T ] and k = 1, . . ., N .For each θ and each k, the Ornstein-Uhlenbeck process u k generates the measure P θ,k T in the space of continuous real-valued functions on [0, T ], and, by Theorem 2.1(2), the measures are equivalent for different valued of θ.Similarly, the vector u (N,θ) = (u k , k = 1, . . ., N ), generates a probability measure P θ N,T on the space of continuous R N -valued functions on [0, T ].Since the random processes u k are independent for different k, P θ N,T is a product measure: P θ N,T = N k=1 P θ,k T , and thus the measures P θ N,T are equivalent for different values of θ.In particular, by (2.7), Maximizing the right-hand side of (3.5) with respect to θ, we get the following expression for the maximum likelihood estimator θ N of θ, based on the observations u k (t), k = 1, . . ., N , t ∈ [0, T ]: Theorem 3.3.Estimator (3.6) is strongly consistent and asymptotically normal in the limit N → ∞: where ζ is a Gaussian random variable with zero mean and variance 6θ/T .
To prove consistency, we use Corollary 2.3.Note that each u k is a stable Ornstein-Uhlenbeck process with parameter a = k 2 θ.By Theorem 2.1(1), (3.10) and so By Corollary 2.3,

Diagonalizable stochastic parabolic equations
The ideas used to study the stochastic heat equation on the interval (3.1) extend with little or no modification to equations such as where ∆ is the Laplace operator, and to an abstract parabolic equation In general, if W is a cylindrical Brownian motion on a Hilbert space H, the solution of (4.1) is not an element of H for t > 0. There are two main approaches to circumvent this difficulty: (1) To introduce spatial covariance in the noise and considere W Q instead of W . (2) To consider the equation in a bigger Hilbert space.By Proposition 2.9, the two approaches are essentially equivalent, and we will use the second one.Later on, we will see that many equations driven by W Q can be reduced to equations driven by W .For θ ∈ Θ, consider the following equation: with zero initial condition u(0) = 0 and fixed non-random T > 0.
Definition 4.1.Equation (4.2) is called diagonalizable if the operators A 0 and A 1 have point spectrum and a common system of eigenfunctions {h k , k ≥ 1}.
To prove (4.7), we argue by contradiction.Assume that the sequence Then there is a sequence With no loss of generality, assume that ν kj > 0, and, since Θ is compact, we also assume that lim j→∞ θ j = θ • ∈ Θ (if not, extract a further subsequence).Then (4.8) implies (4.9) lim As a result, if (4.7) fails, then so does (4.4) for Example 4.4.Let G be a smooth bounded domain in R d or a smooth compact d-dimensional manifold with a smooth measure, H = L 2 (G), and let ∆ be the Laplace operator on G(with zero boundary conditions if G is a domain).It is known (see, for example, Safarov and Vassiliev [69] or Shubin [73]) that (1) ∆ has a complete orthonormal system of eigenfunctions in H; (2) the corresponding eigenvalues λ k are negative, can be arranged in decreasing order, and there is a positive number c • such that (4.10) The reader can verify that each of the following equations is diagonalizable and parabolic: From now on, we assume that equation (4.2) is diagonalizable and parabolic, and the eigenvalues of the operators A 0 , A 1 are enumerate so that {µ k (θ), k ≥ 1} is a non-decreasing sequence and (4.6) holds.
Let X be the closure of H in the norm Then every element f of X is represented by a Fourier series f = Recall that the cylindrical Brownian motion W = W (t) is a continuous square-integrable martingale with values in X (see Proposition 2.9).
Definition 4.5.A solution of equation (4.2) is a continuous X-valued random process u = u(t) such that (4.12) Theorem 4.6.Assume that equation (4.2) is diagonalizable and parabolic.Then there exits a unique solution u = u(t) of (4.2).
Proof: Uniqueness of solution follows from the uniqueness of solution of (4.13) for every k: It remains to show that the process u defined by (4.12) is a continuous X-valued process.For 0 and so By (4.5), there exits a C > 1 such that −2µ k (θ) ≤ C for all k ≥ 1 and θ ∈ Θ.Then (4.14) implies which implies u(t) ∈ L 2 (Ω; X) for all 0 ≤ t ≤ T .Next, by the Cauchy-Schwarz inequality, Since each u k is a zero-mean Gaussian process, (4.15) implies and the continuity of u follows from the Kolmogorov criterion (see, for example, Kunita [42, Theorem 1.4.1]).
Remark 4.7.Since the solution is defined by its Fourier coefficients, the space X is not an essential part of the definition and is only necessary to represent u as a process.The reader can check that Theorem 4.6 holds for any Hilbert space X such that H is a dense sub-set of X and the inclusion j : H → X is a Hilber-Schmidt operator, so that k≥1 h k 2 X < ∞. 4.2.Parameter estimation.Consider the diagonalizable parabolic equation driven by a cylindrical Brownian motion on a Hilbert space H. Let X be a Hilbert space such that H is a dense subset of X and W = W (t) is an X-valued continuous square-integrable martingale (for example, we can define X by (4.11)).According to Theorem 4.6, the solution u = u(t) of this equation is a continuous X-valued process Assume that the observations of u k (t) are available for t ∈ [0, T ] and k = 1, . . ., N .For each θ and each k, the Ornstein-Uhlenbeck process u k generates the measure P θ,k T in the space of continuous real-valued functions on [0, T ], and, by Theorem 2.1(2), the measures are equivalent for different valued of θ.Similarly, the vector u (N,θ) = (u k , k = 1, . . ., N ), generates a probability measure P θ N,T on the space of continuous R N -valued functions on [0, T ].Since the random processes u k are independent for different k, P θ N,T is a product measure: P θ N,T = N k=1 P θ,k T , and thus the measures P θ N,T are equivalent for different values of θ.In particular, by (2.7), (4.18) Maximizing the expression on the right-hand side of (4.18) with respect to θ, we get the following expression for the maximum likelihood estimator θ N of θ based on the observations u k (t), k = 1, . . ., N , t ∈ [0, T ]: Define J = min{k : µ n (θ) > 0 for all n ≥ k and θ ∈ Θ}; see (4.6).
(a) The following conditions are equivalent: N →∞ θ N = θ with probability one for all θ ∈ Θ; (3) the measures { Pθ T , θ ∈ Θ} generated by the solutions of (4.16) in the space of continuous X-valued processes are mutually singular for different θ (as in Theorem 4.6, X is a Hilbert space such that the embedding H → X is Hilbert-Schmidt).
(b) If (4.20) holds, then Proof: (a) First, we show that (4.20) is equivalent to (4.21).By (4.17), both the top and the bottom on the right-hand side of (4.23) are sums of independent random variables.Next, ( Indeed, setting a n = ν 2 n µ −1 n (θ) and A n = n k=J a k , we notice that Then the strong law of large numbers (Theorem 2.2), together with the equality with probability one, and (4.21) follows.This completes the proof that (4.20) is equivalent to (4.21).
Next, we show that (4.20) is equivalent to singularity of measures Pθ T .Since u is a Gaussian process, the measures are either mutually absolutely continuous or mutually singular (Feldman [19] or Hájek [21], [22]), and, by a result of Koski and Loges [39,  This completes the proof of Theorem 4.8.

Discussion and examples.
First of all, let us formulate condition (4.20) in terms of the orders of the operators in the equation.Let A 0 , A 1 be elliptic differential or pseudo-differential operators, either on a smooth bounded domain in R d or on a smooth compact d-dimensional manifold, and let m 0 , m 1 , be the orders of A 0 , A 1 respectively, so that 2m = max(m 0 , m 1 ).Then, under rather general conditions we have (4.30) for some positive numbers c 1 , c(θ); see, for example, Safarov and Vassiliev [69].
If (4.30) holds, then condition (4.20) becomes (2m which was established by Huebner and Rozovskiȋ [30].On the other hand, Theorem 4.8 covers operators with more exotic eigenvalues, such as ν k = k ln k or ν k = e k ; such eigenvalues can appear in the problems of statistical inference based on information from many independent but not identical channels [38]. The reader can verify that the additional assumption ν 2 k /µ k (θ) ∼ k β for some β ≥ −1 simplifies the proof of Theorem 4.8 in at least two ways: (1) Relation (4.25) can be replaces with a less delicate bound using (2.4): .
(2) The classical Central Limit Theorem (Theorem 2.4) can be applied instead of a more sophisticated martingale version.Next, we consider the effects of a non-zero initial condition.Even though it was assumed everywhere that u(0) = 0, Theorem 4.8 extends to nonzero initial condition u(0) = φ as long as φ is non-random and belongs to H. Indeed, the Fourier coefficients of the solution satisfy and, for k ≥ J, As a result, if then (4.24) and (4.25) hold.The computations also show that (1) it is important to have φ non-random: otherwise, the processes u k will no longer be independent, and the analysis will become much more complicated; (2) condition (4.32) can be further relaxed, although the specifics will depend on the rate of growth of ν k and µ k (θ); (3) if the series k≥1 φ 2 k diverges fast enough, then a consistent estimator is possible even if (4.20) does not hold.
The details are left to an interested reader (see also Huebner [23]).
Next, we discuss how the presence of the spatial covariance in the noise term affects the model.
Let us consider the equation where Q is a positive linear self-adjoint operator.Then we can write Q = BB ⊤ for some operator B, and the equation becomes If B −1 exists, then we get back to the original model (4.16) by considering provided this equation is diagonalizable and parabolic in the sense of Definitions 4.1 and 4.2.If B −1 does not exist, there are two possibilities: (1) (u 0 , h i ) 0 = 0 for every i such that Bh i = 0.In this case, u i (t) = 0 for all t > 0, so that we can factor out the kernel of B and reduce the problem to invertible B. (2) (u 0 , h i ) 0 = 0 for some i such that Bh i = 0.In this case, u i (t) = u i (0)e −ρit−νiθt and θ is determined exactly from the observations of u i (t): Let us now look at some concrete examples of (4.16). ( where N (a, σ 2 ) is a normal random variable with mean a and variance σ 2 , and the convergence is in distribution.More generally, if equation du + θA 1 u dt = dW, θ > 0, is diagonalizable and parabolic and ν k > 0, then ν 2 k /µ k (θ) = ν k /θ and condition (4.20) is satisfied, so that θ N is consistent and asymptotically normal: (2) Consider equation with zero initial and boundary conditions and d ≥ 2. Denote by λ k , k ≥ 1, the eigenvalues of the Laplace operator ∆; recall that λ k < 0. Clearly, where and c • is from (4.10).
(3) Consider the equation with zero initial and boundary conditions: As before, denote by λ k the eigenvalues of the Laplacian ∆.Clearly,

Further directions
The proof of Theorem 4.8 is the main objective of the current paper.There are certainly many other statistical problems that have been studied for stochastic parabolic equations, and below is a (partial) list of these problems.Surveys by Prakasa Rao [63], [65] can provide more details on some of the topics.

Diagonalizable equations.
The bottom line is that any problem of statistical inference for a stable Ornstein-Uhlenbeck process has a potential for an extension to diagonalizable stochastic parabolic equations.On the other hand, note that the numerous interesting problems for the unstable Ornstein-Uhlenbeck process do not have similar extensions to stochastic parabolic equations because of the parabolicity condition: only finitely many Fourier coefficients of the solution can be unstable processes.
Here is a (partial) list of the corresponding results and references.
Somewhat different types of diagonalizable equations have also been considered.
Equations with multiplicative noise are diagonalizable only if the noise has no spatial structure.The simplest example is du = θ u xx dt + u dw(t), 0 < x < π, with zero boundary conditions, where w is a standard Brownian motion.Of course, it is no longer possible to assume that u(0, x) = 0.The Fourier coefficients u k , k ≥ 1, are now Geometric Brownian motions driven by the same Brownian motion w, making the problem extremely singular from the statistical point of view.In particular, it was shown in [14] that the parameter θ can be determined exactly and in closed form from just two Fourier coefficients.For example, if u 1 (0) = 0 and u 2 (0) = 0, then for every T > 0.
To get a cylindrical fractional Brownian motion, the usual Brownian motions in (2.20) are replaced with independent fractional Brownian motions w H having the same Hurst parameter H.When H > 1/2, many of the results about the resulting maximum likelihood estimator turn out the same.In particular, [67] studies the corresponding modification of equation (1.1), and [15] establishes an analogue of Theorem 4.8.

General equations.
If equation is not diagonalizable, that is, the operators A 0 and A 1 do not have a common system of eigenfunctions, then the analysis of the estimation problem becomes substantially more complicated.While the result about singularity of measures is still valid under the condition (4.31), there is no natural family of regular models to consider.In fact, there are at least two possibilities: (1) Galerkin approximations: Huebner [23], [24], Huebner et al. [29].(2) Finite-dimensional projections: Lototsky and Rozovskiȋ [51], [52], Lototsky [49].
Both [29] and [51]   Note that u N = Π N u in the diagonalizable case if the basis {h k , k ≥ 1} is the common system of eigenfunctions of A 0 , A 1 .The estimator suggested in [29] is H dt , which is the maximum likelihood estimator for the absolutely continuous statistical model generated by (5.2).The multi-parameter case is considered in [24].The estimator suggested in [51] is (5.4) θN = T 0 (Π N A 1 u(t), dΠ N u(t) − Π N A 0 u(t) dt) H T 0 which is not a maximum likelihood estimator.Under the order condition (4.31), the infinite-dimensional model (5.1) is singular, and, as N → ∞, consistency (in probability) and asymptotic normality hold for both (5.3) and (5.4).The paper [52] is a shorter version of [51], and [49] studies estimators of the type (5.4) for the two-parameter estimation problem du + (θ 0 A 0 + θ 1 A 1 )u dt = dW.
5.3.Non-spectral methods.Similar to finite-dimensional models, parameter estimation in stochastic parabolic equations can be studied in long-time or small-noise asymptotics.The interesting situation is when the infinite-dimensional problem is absolutely continuous, and in this situation (1) Estimators in the large-time asymptotic were studied by Loges [39], [48].(2) Estimators in the small-noise asymptotic were studied by Huebner [25], Ibragimov and Khas'minskiȋ [32], [33], [34], [35], and Prakasa Rao [61].A different class of problems is a combination of filtering and estimation, when the observations are where B is a operator with a fixed finite-dimensional range R n and v is a R n -valued Brownian motion independent of W . From the statistical point of view, this problem is always absolutely continuous, as the measures generated in C((0, T ); R n ) by y are absolutely continuous with respect to the Wiener measure (Liptser and Shiryaev [46,Theorem 7.4]).

Theorem 2 . 2 (ξ n b 2 n
The strong law of large numbers).Let ξ n , n ≥ 1, be a sequence of independent random variables and b n , n ≥ 1, a sequence of positive numbers such that b n+1 ≥ b n , lim n→∞ b n = +∞, and n≥1 Var See, for example, Shiryaev [72, Theorem IV.3.2].

(4. 1 )
du + (A 0 + θA 1 )u dt = dW under suitable assumptions about the operators A 0 , A 1 .The key feature of equation (3.1) is the possibility to write the solution using separation of variables; in what follows, we generalize this feature to (4.1) using the notion of a diagonalizable equation.

ν 2 k
/µ k (θ) < ∞, θ ∈ Θ.If u is a solution of (5.1), the Galerkin approximation u N of u is the solution of the equation (5.2)du N + (Π N A 0 + θΠ N A 1 )u N dt = dW N ,while the projection Π N u satisfiesdΠ N u(t) + (Π N A 0 + θΠ N A 1 )u dt = dW N ,which is not an equation for Π N u unless Π N commutes with A 0 and A 1 .
ds + v(t), then (2.17) is obvious because the series on the top converges and the series on the bottom diverges.If α ≥ −1/2, then (2.17) follows from n k=1 ds, dy); see, for example, Walsh[74, p. 284].Definition 2.7 can be generalized to allow spatial covariance: given a non-negative, bounded, self-adjoint operator Q on H, we define the Q-cylindrical Brownian motion W Q on H by replacing (2.18) with(2.22) by Yu. A. Kutoyants.Definition 2.11.The estimator Ψ N of θ is said to converge to θ with the rate of convergence N α , α > 0, if the sequence {N α (Ψ N −θ), N > 0} converges in distribution to a non-degenerate random variable ζ (that is, Var ζ > 0).If ζ is a Gaussian random variable, then Ψ N is called asymptotically normal.
with probability one.
Proposition 1], the measures are mutually absolutely continuous if and only if the series ν 2 k /µ k (θ) converges (see also Mikulevicius and Rozovskiȋ [54, Corollary 1] for a more general result about absolute continuity of measures).(b)Toprove (4.22), use (4.29) and apply Corollary 2.6 with f k(t) = ν k u k (t).An interested reader can also verify that, in general, condition (4.20) is not enough to apply the classical Central Limit Theorem (Theorem 2.4).
consider the equation (5.1) du + (A 0 + θA 1 )u dt = dW, fix an orthonormal basis {h k , k ≥ 1} in H, and define Π N , the orthogonal projection on the span of h 1 , . . ., h N , and W N = (W h1 , . . ., W hN ).The equation is assumed parabolic in the usual sense of partial differential equations, which implies existence, uniqueness, and regularity of the solution similar to Theorem 4.6 (see, for example, [68, Chapter 3]; our Definition 4.2 of parabolicity is a particular case of the general definition applied to diagonalizable equations).According to Mikulevicius and Rozovskiȋ [54, Corollary 1], the measures generated by the solutions for different θ are mutually absolutely continuous if and only if k (t) dt < ∞, or (see (4.24)) k≥J