Estimation of quantum finite mixtures

We consider the problem of determining the weights of a quantum ensemble. That is to say, given a quantum system that is in a set of possible known states according to an unknown probability law, we give strategies to estimate the individual probabilities, weights, or mixing proportions. Such strategies can be used to estimate the frequencies at which different independent signals are emitted by a source. They can also be used to estimate the weights of particular terms in a canonical decomposition of a quantum channel. The quality of these strategies is quantified by a covariance-type error matrix. According with this cost function, we give optimal strategies in both the single-shot and multiple-copy scenarios. The latter is also analyzed in the asymptotic limit of large number of copies. We give closed expressions of the error matrix for two-component quantum mixtures of qubit systems. The Fisher information plays an unusual role in the problem at hand, providing exact expressions of the minimum covariance matrix for any number of copies.


I. INTRODUCTION
Suppose we are given a quantum system which is known to be in one of several states with some unknown probability, such as a photon that travels through a communication channel and codifies some message. These states can be nonorthogonal due to, e.g., errors occurring during the transmission, but can also be made to overlap intentionally, e.g., to avoid possible eavesdropper attacks in quantum key distribution. Given this set of possible fixed states, we wish to find an estimate of the probabilities that best describe the state we have been provided with. More succinctly, assuming that a state ρ λ is a convex combination of a given set of states {ρ r }, we wish to best estimate the value of the weights {λ r }, which we arrange in the column vector λ and characterize the quantum ensemble {(λ r , ρ r )}, by performing suitable measurements on the system. The analogous classical problem appears in the field of statistical modeling under the name of estimation of finite mixtures [1]. The formal study of finite mixtures was initiated by Pearson in 1894 [2]. He was conducting a biometric investigation on data collected from crabs, and found that the distribution of the size of their forehead (relative to the size of the body) presented an unexpected skewness, which could not be modeled with a symmetric normal distribution. Pearson showed that the data were very well fitted by a mixture of two normal distributions. The presence of two components was taken by Pearson as evidence that there were two different species of crabs. In this way finite mixture models can be used to expose any grouping in underlying data (clustering of data). With the prior knowledge on the individual component densities, which can be inferred or estimated by other means, finite mixture estimation enables one to estimate the weights, or proportions, of the different populations from the gathered coarse-grained data. A (classical) finite mixture, p λ (i) = λ r p r (i) (in the obvious notation), can thus always be interpreted as describing situations where the information on the grouping is lost, or in other words, as marginals of a joint distribution p(i, r), such that p λ (i) = r p(i, r), i.e., p r (i) can be viewed as the conditioned probability p r (i) = p(i|r).
In this paper we approach the problem of estimating quantum finite mixtures. More precisely, we give optimal strategies to estimate the vector of weights λ under the assumptions given above (known set {ρ r } of possible states). We address also the situation in which we are provided with N identical and independent copies of the state ρ λ , to which we will refer as average state. In this multiple-copy scenario, we further assume that generalized collective measurements can be performed on ρ ⊗N λ . For large N , we also give (local) strategies based on projective measurements on individual copies that have the same performance as the optimal collective strategies.
Quantum ensembles are necessary to describe situations in which complete prior information is lacking. In the context of quantum communication, for instance, one estimates the frequency of different (known) states coming out of a source, i.e., one gathers information from the average state in connection to its particular preparation procedure. It is well known that in general there is no unique quantum ensemble consistent with a given mixed state [3]. Therefore there will be instances in quantum finite mixture estimation, called unidentifiable, where the average state ρ λ does not fully determine the value of the weights λ, which therefore cannot be estimated with unlimited precision even when an arbitrary number of copies of ρ λ is provided. This problem is related to that of discrimination of quantum ensembles [4], where it is necessary to consider as inequivalent the different ensembles that are consistent with a given mixed state. We also note that, as in the classical case, a quantum finite mixture can be interpreted as the marginal density matrix of an extended system-ancilla state when a particular measurement is done on the ancilla. A quantum ensemble also describes the output of a stochastic quantum channel (or generalized measurement) for a fixed input state. In particular, if the input state is taken to be one part of a bipartite maximally entangled state, the stochastic channel is fully characterized by the output state, and it can be interpreted as a quantum finite mixture. Therefore, the results that we present here can be applied to the estimation of the weights of the individual (or of a subset of) Kraus operators in a particular operator sum representation of a channel. For example, we can easily give bounds on the precision of estimating the weight of bit flip, phase flip, and combined bit-phase flip errors, or also the total weight of two-qubit Pauli errors versus single qubit Pauli errors.
Quantum finite mixture estimation is a novel ground for quantum estimation theory [5][6][7][8], which is one of the basic tools in the field of quantum information and has been continuously developing since the late 1970s. Many problems have been addressed, ranging from the estimation of a single parameter-as, e.g., a phase [8], or the losses of a quantum channel [9]-to full tomography. Quantum estimation theory finds also many applications in quantum metrology [10]-such as improvement of frequency standards [11], gravitationalwave detection [12,13], and clock synchronization [14,15]and it is often a key ingredient in other quantum computation [16] and communication topics, e.g., quantum benchmarks for teleportation experiments [17]. The recent problem studied by Konrad et al. [18] can be viewed as a quantum finite mixture estimation in a simplified context. In the present paper we address the issue in full generality. This, in passing, will enable us to answer most of the questions posed there.
The paper is organized as follows. In Sec. II we introduce the general framework and give the main results for both, the single-and multiple-copy scenarios. The asymptotic limit of large number of copies is addressed in Sec. III. The two sections conclude with a discussion on unidentifiability of mixtures and its consequences. Additionally, in each of these sections, we provide examples to illustrate the use of the techniques that we introduce. Section IV is devoted to two-component mixtures, where closed expressions can be given for rather general situations. The conclusions are in Sec. V and several technical details can be found in the Appendices, which also include an example of a two-step adaptive local strategy that is optimal.

A. General framework
As already mentioned in the Introduction, a quantum finite mixture is defined to be the convex combination in Eq. (1), where λ belongs to the unit (M − 1) simplex (i.e., the set {λ : λ r 0, M r=1 λ r = 1}). By quantum finite mixture estimation we mean the following: assume we have been provided with a copy of the average state ρ λ (or with several identical and independent copies of it; i.e., with ρ ⊗N λ ), of which we know nothing about the actual value of λ but that it has been drawn from a (prior) probability distribution π (λ). Assume also that we are allowed to perform generalized measurements on the copy (or copies) of ρ λ . Our task is to determine λ (or, maybe, some linear combinations of its components λ r ; namely, a = a t λ, where a is some vector of constants a r ). This has necessarily to be based on the output(s) of our measurement(s) on ρ λ (ρ ⊗N λ ). Due to the inherent nature of quantum measurements, the determination of λ cannot be perfect and we can only hope to obtain an estimate within some accuracy. Our goal is to obtain the best estimate.
To give a precise meaning to the term 'best estimate' we take a Bayesian approach and introduce as cost function the covariance-type error matrix where λ χ is our estimate of λ based on the outcome χ of our measurement and · stands for averaging over λ and χ . More precisely, the averaging is performed over the joint probability distribution p(χ, λ) = p(χ |λ)π (λ), where p(χ |λ) is the probability of obtaining the outcome χ conditioned to the actual value of λ. In Quantum Mechanics this conditional probability is given by Born's rule: p(χ |λ) = tr E χ ρ λ , where {E χ } is the positive operator-valued measure (POVM) that defines our generalized quantum measurement. The trace of the error matrix gives the total mean square error (MSE), E = tr , while the expectation value E a = a t a gives the mean square error in the estimation of a.
In order to analyze one-copy and multiple-copy estimation in a unified framework we have found it convenient to define quantum finite mixtures, Eq. (1), in a slightly more general form, allowing for nonlinear mixtures of the type where the coefficient functions satisfy α c α (λ) = 1 for all λ [but not necessarily c α (λ) 0], and the range of values for α may not coincide with that for r in Eq. (1). As for linear finite mixtures, our goal still is to best estimate λ (we assume that the functional dependence of the coefficient functions c α on λ is known). The error matrix can be written as where p(χ ) is the marginal of p(χ, λ) and · χ indicates averaging over the conditional probability p(λ|χ ) = p(χ, λ)/p(χ ) (Bayes rule). More explicitly, p(χ ) = dλp(χ, λ), where we use the shorthand notation dλ = δ( r λ r − 1) r dλ r . Note that the Dirac δ function, along with λ r 0, guarantees that λ is a point in the unit (M − 1) simplex (hereafter, simplex for brevity). Equation (4) can be cast as with δ χ = λ χ − λ χ . Note that all dependence on our particular choice of the estimator λ χ is contained in δ χ . Since the matrix δ χ δ t χ is manifestly positive semidefinite, the estimator that minimizes our cost function is (Note that the components of λ χ are non-negative and add up to one; i.e., λ χ is a probability vector.) Hereafter, we will only consider this optimal estimator, which gives the smallest error matrix. We will denote this matrix by the same symbol to simplify the notation. Hence, we may write By rearranging the remaining terms in Eq. (5) one can further simplify the expression of the error matrix to obtain where it is important to note that the first average is over the prior distribution π (λ) alone, i.e., independent of the measurements we may perform on the average state. As to the second term, we may write the average value of λ as where we have defined˜ α = λc α (λ) − λ c α (λ) and used that α c α (λ) tr E χ ρ α = p(χ ). Inserting this result in Eq. (8) we find where = λλ t − λ λ t is the covariance matrix of the unknown weights, i.e., its elements are the second-order moments of the prior distribution π (λ). In order to interpret the second term in this equation, we define an effective state σ λ that combines information relative to the prior distribution of λ with the quantum states ρ α : where we have definedλ = λ . It is shown in Appendix A that this equation defines a proper density matrix. Let p λ (χ ) be the probability distribution of the outcomes obtained when performing the POVM measurement {E χ } on this effective state, namely p λ (χ ) = tr E χ σ λ . Then, Eq. (10) can be written in a very appealing form as where F(λ) is the Fisher information matrix of the probability distribution p λ (χ ), whose elements are defined by and we use the compact notation ∂ r = ∂/∂λ r . Some comments are in order. Note that the error matrix has two distinct contributions: (i) the intrinsic 'error' of the random variable λ (that one would obtained by just guessing the weights of the quantum finite mixture without performing any measurement whatsoever), which is given by the covariance matrix ; and (ii) the Fisher information of the effective state σ λ , which represents the information gathered from the outcomes of the measurement on the average state ρ λ . Naturally, this information reduces the uncertainty on the actual value of λ, which explains the minus sign in Eq. (12). Despite this very natural interpretation, one might be somehow surprised to find the Fisher information matrix in the context of Eq. (12). It usually appears in connection to the Cramér-Rao bound (see Sec. III A below), where it provides lower bounds to the MSE in estimation problems. Typically these lower bounds are attained only in the asymptotic limit of many identical and independent copies. Note however that relation (12) is an exact expression.
More interestingly for our purposes here, relation (12) enables us to apply known results [5,6] concerning the Fisher information. In particular, the Braunstein and Caves inequality [19], which states that the Fisher information is upper bounded by the so-called quantum Fisher information (QFI) matrix H(λ). Thus, Before proceeding, we recall the definition of H(λ). Its matrix elements, which depend only on the family of states σ λ , are given by where the matrix L r (λ) is the symmetric logarithmic derivative (SLD), (implicitly) defined as Although Eqs. (15) and (16) are particularized to the case under consideration, they also apply to a general situation where σ λ represents an arbitrary family of states, such as that defined by ρ λ . We also recall that the SLD is most easily computed in the basis that diagonalizes σ λ . A simple calculation leads to where {|φ n } and ν n are the eigenvectors and eigenvalues of σ (λ) = ρ λ , respectively. Let us go back to Eq. (14). Since H(λ) is independent of the measurement (as pointed out above, it only depends on the effective state σ λ ), Eq. (14) provides an absolute lower bound to the error matrix .
In those cases where this lower bound is attainable [such as the dimension two case, where λ = (λ, 1 − λ) t , or when the SLD matrices commute with one another], the QFI matrix further provides us with the optimal measurement. In those cases {E χ } can be chosen to be the projectors onto the eigenspaces of L r (λ). An important instance is the estimation of the linear combination a = a t λ. In this case the optimal measurement is given by the projector onto the eigenspaces of L a = r a r L r (λ), and the minimal error E a is exactly given by which comes from sandwiching Eq. (12) with a t and a. In particular, the MSE on a single weight λ r is given by Quantum finite mixtures of orthogonal states (ρ α ρ β = 0) is yet another instance where the bound (14) is attainable. In this case, one can easily check that the MSE is simply given by

B. Estimation with multiple copies
Let us assume that we are given an arbitrary number N of identical and independent copies of the average state ρ λ in Eq. (1). The global state of the N copies can be written as where the components of the 'occupation number' vector k satisfy M r=1 k r = N , and S indicates averaging over all permutations of the N copies, which produces a proper (normalized) state. From this equation we note that the state ρ ⊗N λ can be written in the form (3) with k playing the role of α and c k (λ) = N! r λ k r r /k r !, ρ k = S(ρ ⊗k 1 1 ⊗ · · · ⊗ ρ ⊗k M M ). Because of this, the results of the previous section can be applied to multiple copies.
For arbitrary prior distributions π (λ) that is about all we can say concerning the multiple copy scenario. However, more explicit expression can be derived if a flat distribution of weights can be assumed. This is the most conservative scenario, and also the situation when nothing is known a priori about the weights λ. Appendix B collects useful formulas for computing integrals and averages on the simplex when π (λ) is flat (constant). From this appendix one can easily obtain for the matrix elements of and˜ k , respectively, where Hence, the lower bound on the MSE follows: This is as far as one can get for mixtures of arbitrary states {ρ r }.
In the case of mixtures of orthogonal states we can substitute Eqs. (22) to (24) in Eq. (20) and find a closed expression for the MSE for multiple copies: where we have used the summation formula in Appendix B. Note that the error E ⊥ N vanishes as N goes to infinity.

C. Identifiability
A mixture is identifiable if there exists a one-to-one correspondence between λ and ρ λ . That is to say, if and only if given ρ λ , there is no other vector of weights λ satisfying Eq. (1). In a general situation, though, different vectors λ can give rise to the same density matrix (ρ λ = ρ λ for some λ = λ ) and, therefore, identifiability cannot be taken for granted. Necessary and sufficient conditions for identifiability of classical finite mixtures, p λ (i) = λ r p r (i), were established more than four decades ago by Teicher [20]. These conditions are equivalent to {p r } M r=1 being a linearly independent set. Similarly, the linear independence of the (density) matrices in a quantum ensemble {ρ r } M r=1 , constitutes a necessary and sufficient condition for the identifiability of quantum finite mixtures: states lying in the convex hull of a linearly independent set of density matrices will be identifiable, while all states in the convex hull of a linearly dependent set will necessarily be unidentifiable, except for possibly some states on the boundary.
Identifiability is usually assumed in (classical) mixture estimation (see, e.g., [21]), since unidentifiable models often give rise to ill-defined estimation procedures and their asymptotic theories break down. In contrast, our approach leads to sensible results for the estimation of quantum finite mixtures even in unidentifiable scenarios. The above results for single-copy case, as well as the derivation of the effective model for finite number of copies, can be directly applied without taking notice of identifiability considerations. Care must be taken, however, when applying the asymptotic methods of the next section to unidentifiable mixtures. Such methods assume that the errors go to zero as the number of copies increases, which cannot be guaranteed if mixtures are unidentifiable. We will revisit unidentifiability at the end of Sec. III, where we will introduce ways to circumvent this difficulty.

III. ESTIMATION OF WEIGHTS IN THE ASYMPTOTIC LIMIT
In the preceding sections we have presented protocols to optimally estimate quantum finite mixtures and have obtained bounds on their accuracy using a covariance-type error matrix as a cost functions. We have also identified situations where these bounds are attainable and provided the corresponding optimal measurements; all this, in the framework of singleand multiple-copy estimation. In this section, we focus on the latter, in the asymptotic limit when a large number N of copies is available for the experiment. Although the approach of the preceding sections can be carried out also in this case, asymptotic expansions become involved, with a few exceptions where a closed expression can be found for arbitrary N [see, e.g., Eq. (26)]. Our aim here is to provide more straightforward means to obtain asymptotically optimal estimation protocols in general situations and compute the corresponding MSE. For this, we can resort on the well-known Cramér-Rao (CR) theory and its quantum extension, which we briefly discuss next, particularized to finite mixture estimation. A very powerful result, known as Holevo bound, will be also presented in the next section along with a simple example of use. A more detailed and comprehensive presentation, which includes a discussion on the relationship between this theory and the Bayesian approach of the preceding sections, can be found in [22].
In this framework, to which we will refer as 'pointwise,' one focusses on a fix point in parameter space, i.e., the unit simplex in our case, and restrict oneself to consider locally unbiased (LU) estimators: those for which λ χ λ = λ in some open set, where, in the same spirit of previous notation, · λ indicates averaging over the conditional probability p(χ |λ) at the fixed point λ. We define the error matrix (λ) as 012332-4 It depends on the measurement and on the estimator, i.e., on the particular way one associates λ χ to a given outcome χ of the chosen measurement. For the sake of simplicity in most of this section we will assume that the mixtures are identifiable. The problem of dealing with unidentifiable mixtures will be postponed to the last subsection (Sec. III C).

A. The Cramér Rao bound
A first important result of the theory is the so-called CR bound [23,24]. It states that the error matrix of a LU estimator at λ is lower bounded by the inverse of the Fisher information defined in Eq. (13) with p λ (χ ) = p(χ |λ) = tr E χ,λ ρ λ [note that in this theory the POVM may depend on the vector λ; see the comments after Eq. (30)], namely, Assume now that the same measurement is performed on several independent copies; i.e., on the average state ρ ⊗N λ . Due to the additivity of the Fisher information, in this multiple-copy scenario one has where the subscript 1 refers to the one-copy model ρ λ . This inequality expresses the fact that the MSE of the estimation scales with the inverse of the number of copies, and the accuracy by which we are able to estimate λ with just a copy sets the scale. It is well-known that under some regularity conditions the maximum likelihood estimator achieves the CR bound asymptotically.
In spite of its fundamental character, the CR bound has the drawback that the bound it provides refers to a particular measurement, not necessarily optimal. To go around this difficulty, we invoke the Braunstein and Caves inequality, already discussed in the Sec. II A, and obtain (λ) Recall, however, that this bound is not always attainable but, when it is, the projectors onto the eigenspaces of the SLD L r (λ) define the optimal measurement. It is important to point out here that practical use of this approach requires a two-step measurement in order to saturate the bound. This is necessary because this optimal measurement, and thus the estimator, depend themselves on λ, which we do not know beforehand.
To overcome this difficulty, one can take an asymptotically vanishing fraction of copies, say √ N , and make an initial estimate of the weights λ ini . Then, on the remaining copies one can perform the measurement that is optimal at λ ini , i.e., project on the eigenspaces of L r (λ ini ) (see Appendix C for an explicit example of this procedure). Thus, this two-step adaptive measurement, which is independent of λ, approaches the optimal one in the asymptotic limit at leading order in 1/N , and one may write This equation establishes a bridge between the asymptotic pointwise theory of this section and the Bayesian approach discussed in the first part of this paper. With all this in mind, we conclude that for sufficiently smooth priors π (λ) it holds that So far in this section we have overlooked the fact that not all the components of λ are independent, as λ must lie on the unit simplex. One could circumvent this by simply using the constraint r λ r = 1 to write a particular component, say λ M , in terms of the remaining M − 1 as λ M = 1 − M−1 r=1 λ r . This possibility, however, introduces a huge asymmetry in the calculation which may result in difficulties to invert the Fisher information matrix H 1 and compute the bound (32). Note that inside the unit simplex the variations of λ are constrained by λ · u = 0, where u = (1, 1, . . . , 1). A fully symmetric way of dealing with this issue is to project the information matrices F and H onto the orthogonal complement of span{u}, which we call S. Thus, the CR bound, Eq. (29), takes the form [25] and similarly for its quantum version in Eq. (30), where P S stands for the projector on S and the inverse, [·] −1 , is restricted to the support of P S . As an example, let us consider again the mixture of M orthogonal states and compute the asymptotic expression of E ⊥ N , introduced in Eq. (26). Applying the definition of SLD in Eq. (16) to the one-copy family ρ λ it is straightforward to obtain that L r (λ) = P r /λ r , where P r is the projector onto the support of ρ r . Applying now the definition of the QFI, Eq. (15), to the same family we obtain [H 1 (λ)] rs = δ rs /λ r . For brevity, we omit the arguments and write H 1 for the projection of H 1 onto S, i.e., H 1 = P S H 1 P S , and similarly for other matrices. Let us start by computing det H 1 (here the zero eigenvalue corresponding to the kernel of the projection is, of course, removed from det). Since (i) the determinant of a d × d matrix is a homogeneous polynomial of degree d in its matrix elements and (ii) the vector u has the same projection on each eigenspace of H 1 , it follows that (i) det H 1 must also be a homogeneous polynomial of degree M − 1 in 1/λ r , i.e., in the eigenvalues of H 1 , and (ii) it must be a symmetric function of these eigenvalues. We also note that det H 1 must vanish if any two or more of these eigenvalues are set equal to zero, since in this case S necessarily contains a null subspace of H 1 [in doing so, the condition λ r 1 is temporarily lifted, which is legitimate, since the result we are after, Eq. (34) below, is an algebraic relation that holds for generic {λ r } regardless whether they are probabilities or not]. Hence, where the sum extends to all subsets of M − 1 indexes drawn from 1, 2, . . . , M, and the prefactor 1/M can be easily computed by considering the particular case where all λ r are equal. Reasoning along the same lines, we conclude that

B. Holevo bound
The quantum CR bound is a matrix inequality which is in general nonattainable [a few remarkable exceptions are those discussed in Sec. II A, in the paragraph after Eq. (17), and the example above]. However there is a related bound that one can expect to be saturated asymptotically: the Holevo bound. Indeed for qubit systems asymptotic attainability has been proved by Hayashi and Matsumoto in [26] and the general proof for finite dimensional systems follows from a recent paper by Kahn and Guţȃ [27]. We note that attainability here, as in the CR bound, is proven in a pointwise approach and hence makes implicit use of the two step adaptive measurement that we mentioned above. An important difference here is that at the second step the measurement attaining the Holevo bound will in general be a collective measurement that can not be implemented by local measurements on each copy.
Let us briefly introduce the Holevo bound for quantum finite mixture estimation (see also [22]). Let G be a positive semidefinite matrix and where the minimization is over all pairs ({E χ }, {λ χ }) of measurements on ρ ⊗N λ and estimators for which the latter is LU at λ (the unbiasedness of an estimator depends on the measurement through its outcome probability distribution). Equation (38) is relevant to the problem we are dealing with because its right hand side gives, e.g., the smallest MSE, tr (λ), if G = 1, i.e., C N λ (1) is the MSE of the optimal N -copy estimation scheme.
In Ref. [6] Holevo proved the following bound: where In this expression X = (X 1 , X 2 , . . . , X M−1 ) are Hermitian matrices, one for each independent parameter (thus, for quan-tum mixtures, we will choose λ M = 1 − M−1 r λ r ), satisfying the following relations: tr ρ λ X = 0, (41) tr ∂ r ρ λ X s = δ rs , 1 r, s M − 1. (42) The minimization in Eq. (40) is over the set λ of all such X. Finally, Z[X] is the matrix whose elements are given by Although the Holevo bound (39) is not in general attainable, it is attainable for the class of Gaussian models. The recent work [27] on asymptotic normality shows the asymptotic (local) equivalence between the many-copy states of finitedimensional systems and a Gaussian model and thereby proves the asymptotic attainability of the Holevo bound for finite dimensional systems, i.e., To relate the above with the Bayesian approach of the preceding sections, we need to average over π (λ): asymptotically, we have tr G = N −1 dλπ (λ)C H λ (G) + o(N −1 ). Thus, for instance, the MSE E can be computed as As to whether or not this averaging is legitimate and the resulting bound on the averaged cost function is attainable, there exist very good heuristic arguments, as well as various examples [22], that this should be the case, but no rigorous proof. Thus, this last equation should be taken with a grain of salt.
To illustrate the use of the Holevo bound in finite mixture estimation, let us assume that ρ r , 1 r 4 are four pure qubit states whose Bloch vectors n r form the vertices of a regular tetrahedron: With this, the Bloch vector of the finite mixture is r λ = n 4 + 3 r=1 λ r ( n r − n 4 ). With full generality we may write X r = a r + b r · σ , where σ = (σ x , σ y , σ z ) are the standard Pauli matrices. Conditions (41) and (42) are equivalent to a r = − b r · r λ and ( n r − n 4 ) · b s = δ rs , 1 r, s 3. This very last equation can be inverted (this will be always the case if the mixture is identifiable) and we obtain b r = (3/4) n r , 1 r, s 3. With this, We see that Eqs. (41) and (42)  where rst is the (fully antisymmetric) Levi-Civita tensor in three dimensions. To compute the MSE we need the following: and, averaging over π flat (λ), Eq. (45): where the first (second) figure in the parenthesis comes from the real (imaginary) part of Z[X] in Eq. (51) [Eq. (52)]. It is interesting to note that for this example, the quantum CR bound is not attainable. Indeed, one can check that the SLD L r (λ) is given by where r λ = 4 r=1 λ r n r is the Bloch vector of the averaged state ρ λ and 1 r 4 (we now treat all components of λ as independent, in accordance with the approach developed in Sec. III A). One can immediately check that the commutator of the SLDs does not vanish, and the quantum CR bound is not saturated. Just for the sake of completeness, the quantum Fisher information matrix is given by for 1 r 4. Projecting on S with and (pseudo)inverting, one obtains the relation After averaging, we observe from Eq. (53) that the quantum CR bound cannot be saturated.

C. Unidentifiable mixtures in the asymptotic limit
In the preceding section, we were required to assume that estimation errors become vanishingly small as the number of copies increases. This assumption does not necessarily hold if mixtures are unidentifiable. In order to be able to apply the asymptotic techniques introduced above, we make a useful observation. If a quantum finite mixture is unidentifiable there necessarily exists an orthogonal transformation λ = Oλ such that the states ρ λ depend solely on a reduced number of parameters {ξ r = λ r } m r=1 , with m < M, and are independent of the redundant parameters {η r = λ r } M r=m+1 . The error matrix of the original parameters λ is, of course, related to the error matrix of the new ones, ξ , η, by the similarity transformation = O O t . Any measurement performed on the state ρ λ will only give information about the parameters ξ , whereas the components of η have to be guessed independently of the measurement outcomes (e.g., by random choice). The optimal choice for η is, actually, η , and leads to an error that is, of course, independent of the number of copies. This means that in unidentifiable quantum mixture estimation there will always be an intrinsic error associated to the uncertainty in the redundant parameters η, which remains constant regardless of the number of copies one is provided with. In the asymptotic limit, one can apply the bounds of the preceding sections to the block of corresponding to the relevant components ξ . To illustrate this let us consider the unidentifiable qubit mixture defined by where |± = (|0 ± |1 )/ √ 2. If we perform the following rotation O in parameter space: ⎛ we have This shows that η 1 and η 2 are redundant parameters, and measurements will give us no information about them. For the simple model in Eq. (61), it is straightforward to obtain the Holevo bound. We first check that X = (X 1 , X 2 ) t , with is the solution to conditions (41) and (42) that minimizes Eq. (40). It follows that and Eq. (40) gives In the limit N → ∞, we can compute the error coming from the estimation of ξ through Eq. (45), i.e., 2 r=1 (ξ ) which is asymptotically vanishing. As for the estimation of η 1 and η 2 , we make the optimal guess [according to the notation introduced in the paragraph below Eq. (10), this quantity could also be denoted by 2 r=1 (η) rr ]. Putting all pieces together, the estimation error is In conclusion, this explicit example shows that unidentifiable mixtures will lead to a non-vanishing estimation error even in the asymptotic limit.

IV. ESTIMATION OF TWO-COMPONENT MIXTURES
In this section we dwell on the simplest quantum mixture scenario, where the average state ρ λ belongs to the one-simplex (hence λ 1 = λ, λ 2 = 1 − λ). Although the error matrix is 2 × 2, only one of its entries, say 11 , contains independent information about the accuracy in the estimation of the mixture (69). Therefore, in the following we simply drop the remaining three entries and write [and likewise for the Fisher information matrix F(λ), the quantum Fisher H(λ), etc., to which we will refer as F (λ), H (λ), etc.]. Since there is only an independent parameter, we may also drop the vector notation and write λ instead of λ.

A. Single-shot estimation
The single-copy version of this problem was considered recently in [18], though the optimal measurements and minimal estimation error were only determined when ρ 1 and ρ 2 are qubit and/or pure states. Our results in Sec. II show that for the two-component mixture in Eq. (69), the attainability conditions are fulfilled for any ρ 1 and ρ 2 , and the optimal measurements, along with their minimal estimation error, can always be determined in both single-and multiple-copy scenarios. In particular, it follows from our results that the optimal protocol consists of a projective measurement, where the projectors are those onto the eigenspaces of the SLD L(λ). Our results in the present paper thus provide answers to various open questions posed in [18].

B. Multiple-copy estimation and the asymptotic limit
Although a straightforward exercise, computing for N > 1 copies of ρ λ is a tedious task even for two-component mixtures. In most cases, the resulting expressions cannot be written in closed form for arbitrary N and are thus not very revealing. So, rather than attempting to present a general case, we have selected a particular example, which we will later use to illustrate the connection between the Bayesian and the asymptotic pointwise approaches.
Assume ρ 1 and ρ 2 are commuting non-orthogonal qubit states. Let us further assume that ρ 2 is pure and that the prior is flat. Then, we can choose basis so that Proceeding as in Sec. II B, the N -copy state ρ ⊗N λ can be cast in the form (3) with and ρ k = S[|1 1| ⊗k ⊗ |0 0| ⊗(N−k) ]. Hence, using Eq. (19) we have that the minimum error is given by [recall that we are assuming the flat prior π flat (λ) = 1] where and where now are the 2 N eigenvectors of ρ ⊗N λ (perms{·} stands for the set of distinct permutations of the set {·}). Defining 012332-8 the eigenvalues of ρ ⊗N λ are ν k = A k / N k , and have multiplicity N k . Therefore, As shown in Appendix B3, the terms of the sum above can be written as ratios of regularized incomplete beta functions thus providing a more compact expression for the error. However, we can only give a closed form for in the asymptotic limit of very large number of copies. This requires evaluating the sum in Eq. (78) up to order 1/N : (details of this evaluation are also given in Appendix B). Plugging this expression into Eq. (78) we obtain With the asymptotic techniques introduced in Sec. III A the previous evaluation can be simplified a great deal. Moreover, these techniques enable us to give closed-form expressions of for rather more general two-component mixtures. As already mentioned, the attainability of the CR bound is guaranteed for these (one-parameter) mixtures and its application is particularly simple. From our discussion in Sec. III A, Eq. (32), we can write where we recall that H 1 is the QFI of the one-copy model (69).
As it can be simply read off from Eq. (70), Note, however, that {|φ n } (ν n ) are now the eigenvectors (eigenvalues) of ρ λ , rather than of ρ λ , and the QFI is thus a function of λ. In the Bloch representation we can write which holds when ρ 1 and ρ 2 are both qubit states, but also when they are pure states in arbitrary dimensions. Since in these cases the two density matrices can be taken to be real, it suffices to consider σ = (σ x , σ z ). The eigenvalues and eigenvectors of ρ λ can be written as where, as in previous examples, r λ = λ r 1 + (1 − λ) r 2 is the Bloch vector of ρ λ , and we have defined r λ = | r λ |. After some algebra one finds For pure states, ρ 1 = |ϕ 1 ϕ 1 | and ρ 2 = |ϕ 2 ϕ 2 | (i.e., r 1 = r 2 = 1) one can further simplify this expression and write H pure 1 If the prior is assumed to be flat, π flat (λ) = 1, a trivial integration leads to If ρ 1 and ρ 2 are not pure, the Bloch representation (83) holds only for qubit states. Assuming the flat prior, after a lengthy calculation one finds (to leading order in 1/N ) Recall that for the cases at hand there exist adaptive measurement that attain the above bound. The reader is referred to Appendix C for a specific illustration of this general result. Before ending this section, we come back to the two commuting states example in Eq. (72), for which the estimation error, Eq. (80), was worked out entirely in the Bayesian framework and the limit N was taken afterwards. The same estimation error can be obtained applying the pointwise CR result (88). It is straightforward to check that this much less costly procedure leads to the same result (80), as it should. Recall, however, that it leads to sensible results only if the number N of copies is exceedingly large, whereas the Bayesian approach works for any N .

V. CONCLUSIONS
Quantum ensembles embody what in classical statistics is known as finite mixtures, and can thus be viewed as their quantum counterpart. More precisely, we have a quantum finite mixture whenever a signal can be characterized by a density matrix that is the average of a set of known states (pure or mixed), as is often the case in quantum communication. In these situations, one wishes to find the probability law that best describes the signal, or in other words, the weights that define the quantum ensemble. This has been the subject of the present paper, where we have relied on quantum estimation theory, but also broadened the field by proposing certain applications and tools.
The topics addressed in this paper include: the precise definition of quantum finite mixtures, as an extension of finite mixtures to the quantum domain; optimal estimation (of their weights) when a given number of copies of the average state is available for measurement; optimal estimation in the asymptotic regime of large number of copies; and characterization of the (un)identifiability of quantum mixtures. For each of these topics we have answered the relevant questions and provided useful results, of which we also give some examples of application.
Going into more detail, we have approached optimality from both the Bayesian and the 'pointwise' points of view. In the former, one minimizes an averaged cost function, which we have chosen to be the covariance-type error matrix of the estimation, over a joint probability involving the measurement outcomes as well as the prior knowledge of the weights.
Our key result is = − F [see Eq. (12)]. It states that the error matrix is the intrinsic uncertainty of the weights minus the Fisher information matrix, which quantifies the information gained in the measurement process. This exact relation, valid for any number of copies, is linear in the Fisher information matrix, in contrast to the Cramér-Rao bound, where the error is lower bounded by the inverse of the Fisher information matrix. From our relation one obtains a measurement independent lower bound on the error matrix in terms of the quantum Fisher information. In those cases where the Braunstein-Caves inequality (which states that the Fisher information matrix is upper bounded by the quantum Fisher information) is saturated our bound is attainable for any number of copies. When this holds (e.g., two-component mixtures), we give the optimal measurement protocol, which turns out to be of von Neumann type.
As to the pointwise approach to quantum mixture estimation, we have briefly introduced the quantum Cramér-Rao and the Holevo bounds in the specific context at hand. We have next applied these tools to obtain lower bounds for the error matrix of the weights when the number of copies of the average state is asymptotically large. In those situations, the Bayesian approach becomes rather involved and it is advisable to switch to the tools under discussion. Although the quantum Cramér-Rao and the Holevo bounds can be applied to unidentifiable mixtures, its use requires some technicalities that we have commented upon and illustrated with an example. As one would expect, the accuracy of the weight estimation for such mixtures does not vanish even if an infinite number of copies were available. A discussion on the relationship between the Bayesian and pointwise approaches has been also given, as well as an example illustrating that the two approaches give consistent results.
Among the examples one can find in this paper, we would like to highlight that of a mixture of a number of orthogonal states, which is relevant in the context of channel estimation. For this problem, and assuming a flat prior distribution of weights we have been able to write the minimal square error in a closed form, valid for any number of orthogonal states and any number of copies of the average state.
This paper is mostly devoted to the formalism and general results concerning quantum finite mixtures and the estimation of their weights. The examples are chosen for the sake of illustration, rather than for their practical relevance. As mentioned in the introduction, real applications of our work are, e.g., the characterization of signals in relevant quantum communication problems and the estimation of probabilities with which various errors occur in a given channel. We have shown that in some instances the bounds we give are attainable by local two-step adaptive measurements. It remains an open question to establish whether or not collective measurements are necessary in the general case. Future extensions of our work also include the estimation of mixtures of continuous variable systems. such vectors. To compute this sums, we first note that where we have used that the sum in parenthesis is independent of r. This is so because the set of all vectors k is invariant under k r → k σ (r) , where σ is any permutation of the symmetric group S M , and thus k f (k r ) = k f (k σ (r) ). We next note that any vector k whose first component is fixed to be k 1 gives the same contribution, f (k 1 ), to the last sum in (B4). The number of such vectors follows from Eq. (B3) by simply making the substitutions M → M − 1 and N → N − k 1 . Hence, For the particular case we need in Sec. II B, f (x) = x 2 and the corresponding sum gives k,r

Evaluation of the sum (79)
Recalling the definitions (75) and (77), and after some algebra, we have wherek ≡ N − k and I x (a, b) is the regularized incomplete beta function, wherex ≡ 1 − x. The last factor peaks at x = as N becomes large and can be replaced by the Gaussian Since we are interested only in terms that vanish asymptotically as N −1 , we can drop those that vanish exponentially, and approximate Eq. (B11) by For the same reason, we can expand the first line in Eq. (B12) up to first order in u 2 ≡ (x − ) 2 and write The remaining integral gives Hence, from which the final result follows.

APPENDIX C: TWO-STEP ADAPTIVE MEASUREMENT IN THE ASYMPTOTIC LIMIT
In this appendix we give an explicit example of the two-step adaptive measurement protocol that attains the Cramér-Rao bound asymptotically [see Sec. III A, the paragraph after Eq. (30)]. To ease the calculation we choose the simplest instance: that of a mixture of two pure states, ρ λ = λρ 1 + (1 − λ)ρ 2 . This mixture has been already considered in Sec. IV, in the paragraph after Eq. (85). Here we stick to the same notation. If ρ r (r = 1, 2) are pure, without loss of generality they can be chosen to be ρ r = 1 2 [1 + σ z cos θ + (−1) r+1 σ x sin θ )] (C1) (as if they were qubit states on the equator of the Bloch sphere), where cos θ = √ tr ρ 1 ρ 2 = | ϕ 1 |ϕ 2 | is the overlap. Let us assume that we are given N copies of the state ρ λ . On a first stage of the protocol, we take √ N of these copies and perform on each of them a same measurement, with the aim of obtaining an initial, rough estimate of λ, which we denote by λ ini . Since these measurements use uncorrelated copies and are themselves independent, we expect to benefit from the well understood statistical improvement that results from averaging over the √ N samples. Thus, we can assume that, in average,

012332-11
where α is some constant whose value depends on the precise measurement that we perform. On a second stage, we refine the rough estimation obtained in the preceding stage by performing a (nearly optimal) measurement on the remaining N − √ N copies. As discussed in Sec. III A (see also Sec. II A), the optimal measurement is described by the set of projector, {P χ (λ)} (it is a von Neumann measurement), onto the different eigenspaces of the SLD, L(λ), of our model evaluated at λ. For our example, one can readily find that However, since we do not know the true value of λ, we choose the measurement to be given by {P χ (λ ini )}, and hope this change will not affect optimality. Let us check that this is indeed the case. To this end, we diagonalize Eq. (C3), obtain {P χ (λ ini )} and, in turn, compute its Fisher information defined in Eq. (13). We obtain F 1 = sin 2 θ λ(1 − λ) + (λ ini − λ) 2 cos 2 θ .
(C4) (recall that the subscript 1 refers to one copy). Thus, the error of performing this measurement on the N − √ N copies is For sufficiently large N (so that √ N itself is also very large), Eq. (C2) holds in average, and thus attaining the optimal bound, as can be read off from Eq. (86).