Optimal full estimation of qubit mixed states

We obtain the optimal scheme for estimating unknown qubit mixed states when an arbitrary number N of identically prepared copies is available. We discuss the case of states in the whole Bloch sphere as well as the restricted situation where these states are known to lie on the equatorial plane. For the former case we obtain that the optimal measurement does not depend on the prior probability distribution provided it is isotropic. Although the equatorial-plane case does not have this property for arbitrary N, we give a prior-independent scheme which becomes optimal in the asymptotic limit of large N. We compute the maximum mean fidelity in this asymptotic regime for the two cases. We show that within the pointwise estimation approach these limits can be obtained in a rather easy and rapid way. This derivation is based on heuristic arguments that are made rigorous by using van Trees inequalities. The interrelation between the estimation of the purity and the direction of the state is also discussed. In the general case we show that they correspond to independent estimations whereas for the equatorial-plane states this is only true asymptotically.

We obtain the optimal scheme for estimating unknown qubit mixed states when an arbitrary number N of identically prepared copies is available. We discuss the case of states in the whole Bloch sphere as well as the restricted situation where these states are known to lie on the equatorial plane. For the former case we obtain that the optimal measurement does not depend on the prior probability distribution provided it is isotropic. Although the equatorial-plane case does not have this property for arbitrary N , we give a prior-independent scheme which becomes optimal in the asymptotic limit of large N . We compute the maximum mean fidelity in this asymptotic regime for the two cases. We show that within the pointwise estimation approach these limits can be obtained in a rather easy and rapid way. This derivation is based on heuristic arguments that are made rigorous by using van Trees inequalities. The interrelation between the estimation of the purity and the direction of the state is also discussed. In the general case we show that they correspond to independent estimations whereas for the equatorial-plane states this is only true asymptotically.

I. INTRODUCTION
Two-state systems or qubits are the building blocks of many applications in Quantum Information. Although they are commonly assumed to be in pure states, in real situations they are not. State preparation, processing, quantum channels, etc. are inevitably imperfect, which means that any quantum system is, in fact, in a mixed state. The accurate estimation of the parameters that characterize qubit mixed states is therefore of utmost relevance for practical applications. The aim of this work is to find the optimal (most accurate) scheme to perform this task.
So far, most of the work in state estimation has focused on pure qubit states [1][2][3] and fewer quantitative results have been obtained for qubit mixed states [4][5][6][7][8][9]. One obvious reason for this is the greater complexity of the estimation procedure. Whereas pure states are fully characterized by just two parameters -those specifying a point on the surface of the Bloch sphere, i.e., a unit vectorfor a mixed state an additional parameter is required to specify its purity, by which we mean the distance from the center of the Bloch sphere to the point that represents the state. This brings a theoretical subtlety: we will need to identify a uniform prior distribution for the purity. In contrast to the pure-state case where there is a "natural" uniform probability distribution -the invariant measure on the 2-sphere-, for mixed states there is no unique choice. A uniform distribution must be isotropic (invariant under rotations of the Bloch sphere), but the purity, which is itself invariant, can be distributed according to a whole class of functions [10,11], depending on several criteria. Despite this ambiguity, our results turn out to be rather general and, in particular, they do not depend on the specific choice of an isotropic purity prior.
In this paper, we assume that we have N identically prepared systems upon which we can perform generalized measurements. From their outcomes we can infer the value of the parameters that characterize the state of the systems. The quality or accuracy of the estimation is quantified by the fidelity (to be defined in the next section). The average of the fidelity over the prior and the outcome distribution provides a useful summary parameter of the overall quality of the estimation scheme. This problem was partially addressed in [5]. Here we present an alternative formulation that enables us to apply the approach to new, practically relevant situations and find many explicit results.
To be more specific, we will study two types of situation: that of estimating anà priori completely unknown qubit state and that of estimating a state that is known to lie on an equatorial plane of the Bloch sphere. We call the former the 3D case (or just 3D for short), as the state can be represented by any point in the 3dimensional Bloch sphere. By the same logic, we call the latter 2D. The 2D case is useful because in many applications quantum states can be parametrized by the purity and a phase; e.g., linearly polarized photons. The 2D case also exhibits some remarkable theoretical features. For instance, we will show that while for 3D states the optimal measurement is essentially unique, independently of the isotropic prior, this is not so for 2D states, though this feature is recovered in the asymptotic limit of large N .
We will first address the problem from a Bayesian point of view, which will provide explicit results for any finite N . We will also take a steep dive into the asymptotic regime of the estimation schemes. It is clear that unknown states can only be estimated with perfect accuracy in the limit N → ∞. The rate at which this perfect determination limit is achieved as N increases is a very informative parameter. It is useful, e.g., to compare different estimation schemes. If two schemes have the same rate, we say that they are (asymptotically) equivalent. The asymptotic behavior is also a central notion in statistics, where there exists a wealth of results and very powerful techniques [12,13].
Within the statistical framework, one optimizes over all measurements and estimators, when the signal state is taken to be fixed. It turns out, under regularity conditions, that the maximum likelihood estimator is asymptotically optimal whatever the true signal state. The mean square error of the estimator gives a measure of the quality of the scheme. This error can be related to the fidelity through the Fisher information matrix, thus providing a connection with the Bayesian approach. In this context, the prior distribution plays a very minor role. In contrast, within the Bayesian approach the prior distribution does play a significant role because, as mentioned above, one is interested in obtaining an estimation that is optimal on average.
Here we present in a fairly comprehensive way the application of the two approaches to the asymptotic behavior of qubit mixed state estimation. We will see that both yield the same results. This fact has important consequences. It tells us that the asymptotic behaviour of the optimal mean fidelity only depends on the prior as an average of the optimal pointwise (i.e., for a fixed state) fidelities. Second, the Bayesian approach provides an explicit scheme that attains the pointwise bounds. It is worth pointing out that for some restricted schemes and some priors this might not be the case. For instance, it is known that a scheme based on fixed local measurements with the Bures prior distribution [14] does not approach unity at a rate 1/N [8], as a pointwise approach would indicate. Even more surprising, in this situation the Bayesian and the Maximum Likelihood estimation give different asymptotic average fidelities [8], in contrast to the common lore that both estimators should be asymptotically equivalent, pointwise. The nonequivalences here do all have simple explanations. Pointwise, everything is asymptotically equivalent and does converge at rate 1/N . However, the convergence is not uniform or the integrated coefficient of 1/N diverges. This paper is organised as follows. In the next section we introduce the notation and main concepts that will be used throughout this work. In Sec. III we obtain the optimal estimation protocol for any number of copies of the state in both the 3D and the 2D cases. In Secs. IV and V we compute the asymptotic expression of the fidelity from both the Bayesian and the pointwise approaches, respectively. The derivation of the latter is done through a rather self-contained presentation since some of the techniques may not be so well known among physicists. In Sec. VI we summarise our main results. We have relegated many technical details to the appendices for the benefit of readers not interested in technicalities II. PRELIMINARIES Consider an ensemble of N identically prepared states [ρ( r)] ⊗N , where ρ( r) is a density matrix with Bloch representation given by Here σ = (σ x , σ y , σ z ), where σ a , a = x, y, z, are the usual Pauli matrices and r is a point in the Bloch sphere { r : | r| ≤ 1}. We will drop r and write simply ρ where no ambiguity arises. A measurement on ρ ⊗N is represented by a Positive Operator Valued Measure (POVM). It is defined by a set O = {O χ } of positive operators such that where χ refers to the various outcomes that can occur. It can be a discrete or a continuous variable.
In order to estimate ρ we proceed as follows. We first perform a measurement on ρ ⊗N , from which we obtain an outcome χ. Based on χ, an estimate for ρ can be guessed: ρ χ . Its quality is quantified by the fidelity, defined as [14] f which determines the maximum distinguishability between ρ and ρ χ that can be achieved by any measurement [15]. For qubits, Eq. (2. 3) reads where r and R χ are the Bloch vectors of the states ρ and ρ χ respectively, r = | r| and R = | R|.
In the Bayesian approach the overall performance of the estimation procedure is quantified by the average fidelity F , hereafter fidelity in short. It is the average of (2.3) over the prior probability distribution, which we denote dρ, and over all possible outcomes χ of a given measurement, namely where p(χ| r) is the conditional probability of obtaining outcome χ given that the signal state has Bloch vector r. These probabilities are determined by the expectation values of the positive operators O χ , i.e., p(χ| r) = tr [O χ ρ]. Our aim is to maximize (2.5). For a given measurement O, there always exists an optimal guess or estimator. To prove this, we first introduce the four dimensional Euclidean vector r = (r 0 , r x , r y , r z ) = (r 0 , r) = ( 1 − r 2 , r ).
(2.6) Note that r · r ′ = r 0 r ′0 + r · r ′ and |r| = √ r · r = 1. With this, the average fidelity reads where R χ = (R 0 χ , R χ ) is defined in analogy to (2.6). A straightforward use of the Schwarz inequality gives an upper bound of F that is saturated with the choice Using (2.8), the maximum fidelity is Since the guess (2.8) satisfies |R χ | = 1 and its first component is non-negative, it always gives a physical state. In fact (2.8) is the best state that can be inferred and (2.9) is the maximum fidelity that can be obtained given O and the prior dρ.
In the analysis below, it will prove very convenient to block-diagonalize ρ ⊗N by writing it in the basis of the SU(2) invariant subspaces of ( 1 2 ) ⊗N [we use boldfaced integers and half-integers to denote the irreducible representations of SU (2)], which are also invariant under the action of the symmetric group S N (See App. A and also [4,5] for details). In contrast with pure states, for which ρ ⊗N has projection only in the symmetric (N + 1)dimensional subspace of J ≡ N 2 , for mixed states ρ ⊗N has also components in all the lower-dimensional invariant subspaces, which, furthermore, occur with multiplicity, n j , greater than one. We thus write where the lower limit in the direct sum is 0 for even N and 1/2 for odd N , (2.11) and (2.13) Throughout this paper U ( n) denotes the SU(2) unitary representation of the rotation R( n) that takes the unit vector z (pointing along the z-axis) into n ≡ r/r on the Bloch sphere. Recall that defines the standard Wigner matrices [16]. Notice that ρ j are not proper density matrices, since tr ρ j = 1. For 2D states, the Bloch vector r of the state ρ lies on the equatorial xy-plane of the Bloch sphere, i.e., r = r(cos θ, sin θ, 0). We are still entitled to use the decomposition of ρ ⊗N above, but now we write where x is the unit vector pointing along the x-axis and U (θ) is a unitary representation of a rotation of angle θ around the z-axis. Note that U ( x)|jm is an eigenstate of x · J (i.e., of the projection of the total spin operator J along the x-axis), since U ( x) takes z into x (i.e., is a rotation of angle π/2 around the y-axis). Hence, the Bloch vectors of the whole set of states {U (θ)[U ( x)|jm ]} lie on the xy-plane, as they should, and θ is the angle between r and the x-axis.
In the basis |jm the transformation U (θ) is diagonal, and substituting (2.15) in (2.12) we obtain and d (j) mm ′ are the (real) reduced Wigner matrices [16].

III. FINITE NUMBER OF COPIES. BAYESIAN ESTIMATOR
In this section we obtain the optimal POVM and closed expressions of the fidelity for any number of copies of the signal state. Although the 3D and 2D cases look similar, we will show that there are remarkable differences between them.

A. 3D states
As mentioned in the introduction, we consider N identical copies of a quantum state which is chosen according to an isotropic prior distribution dρ = w(r) dr dn, (3.1) where dn is the invariant measure on the 2-sphere dn = d(cos θ) dφ 4π (3.2) and w(r) is normalized such that 1 0 dr w(r) = 1. Let us start by computing the optimal POVM. We first notice that because of the block-diagonal form of ρ ⊗N in (2.10) we may just consider also block-diagonal POVMs, of the form with no loss of generality. Indeed, for any given POVM {O χ }, we can always construct a new one, {Õ χjα }, throughÕ where 1 1 jα is the identity in the j-representation subspace and α (1 ≤ α ≤ n j ) labels the different occurances of j in the Clebsch-Gordan series of ( 1 2 ) ⊗N . If F (F ) stands for the maximum fidelity that can be attained using {O χ } ({Õ χjα }), we have F ≤F . This is readily seen by noticing that the probability p(χ| r) = tr [ρ ⊗N O χ ] is the marginal of p(χjα| r) = tr [ρ ⊗NÕ χjα ], i.e., p(χ| r) = jα p(χjα| r), and no marginal can be more informative than the initial probability distribution. Moreover, because of (2.10), if {Õ χjα } is to be optimal, we may obviously replaceÕ χj1 ,Õ χj2 , . . . ,Õ χjnj by, say,Õ χj1 ,Õ χj1 , . . . ,Õ χj1 without changing the fidelity, which leads us to (3.3).
It is important to note that (3.4) allows us to view j and α as the outcome of the measurement {1 1 jα }. Therefore, in Eq. (2.9) we will have n j |V χj | instead of |V χ |, and an additional summation over j. Hence, our goal is to maximize |V χj | for all pairs (χ, j), where V χj = dρ r tr (ρ ⊗N O χj ). (3.5) The j outcomes give information about the decomposition of ρ ⊗N as a direct sum of SU(2) irreducible components. This, in turn, encodes information about r. For instance, if r = 1 (pure state), the probability of obtaining the outcome j = N/2 is unity. For our purposes, all the information concerning the purity of ρ comes from this source, as we now demonstrate. Since V 0 χj is invariant under rotations, whereas V χj transforms as a 3-vector, we may apply to V χj the rotation R −1 ( n χj ) = R ⊤ ( n χj ), where n χj = V χj /| V χj |, and obtain V ′ χj , such that its x-and y-components vanish, i.e., V ′x χj = V ′ y χj = 0 and where we have defined we have used that dρ is rotationally invariant, and we have written r = r n in spherical coordinates, i.e., n = (sin θ cos φ, sin θ sin φ, cos θ). Therefore, |V χj | = |V ′ χj |, and the maximum fidelity can be computed using V ′j χ instead of V j χ . Hereafter, we drop the primes and write where V 0 χj , V z χj are given by (3.6) and (3.7). Using Eqs. (2.12-2.14) and recalling that cos θ = D (1) 00 ( n), we have where the sum over the indexes m, m ′ , m ′′ runs from −j to j, and we have defined The orthogonality relations of the irreducible representations of SU(2) (Eqs. (4.6.1) and (4.6.2) on Page 62 of Ref. [16]) enable us to write where d j = 2j +1 is the dimension of the representation j of SU (2). We readily see that the z-and 0-components of V χj are bounded by Note that all the χ dependence has been factored out and ∆ 3D takes the form where v 0 j and v z j can be easily worked out from (3.15) and (3.16) Eq. (3.8) clearly implies that the factor in parentheses in (3.17) is unity. Notice that the χ dependence has entirely disappeared in the final bound of the fidelity. Inequality (3.17) is saturated iff the only non-vanishing term of the sum over m ′ in (3.13) corresponds to the maximum value of |m ′ |, namely, j. This implies that An obvious choice that satisfies this condition -and is independent of χ-is (3.20) The operator Ω j is a seed of a continuous covariant POVM, i.e., where µ plays the role of χ. It can be easily verified that, , where dµ (as dn) is the invariant measure over the 2-sphere. This proves that the bound is attainable. POVMs with a finite number of outcomes can also be obtained using the results in [17].
Having obtained the optimal POVM, Eq. (3.21), it is straightforward to compute the conditional probabilities which will be needed in Sec. V. One can check that as it should be. The corresponding guesses can be worked out from (3.5) by simply substituting µ for χ. One can also verify that the angular integration indeed yields the two terms (3.18) and (3.19).
In summary, the fidelity of any optimal POVM can be written as This equation along with (3.18) and (3.19), provide a general expression of the maximum fidelity for any given prior distribution w(r). Unless an explicit expression for w(r) is given, this is as far as we can get. In App. C we present closed expressions of the fidelity for arbitrary N using the Bures prior. In the asymptotic limit N → ∞ however one can derive a compact formula for the fidelity in terms of the mean value of r: r = 1 0 dr w(r) r. This will be done in Secs. IV and V.
Several comments are in order here. Within an optimal scheme, the purity estimator, only depends on j and comes solely from the measurement represented by the POVM {1 1 jα } [18]. All dependence on any other kind of outcome, generically referred to as χ [e.g., µ in Eq. (3.21)], has disappeared. This is expected from symmetry grounds: the parameter r does not change under SU(2) transformations and the optimal purity guess must thus be a function of j/N , as the only SU(2)-invariant quantity in this problem is precisely j. Furthermore, since this measurement ({1 1 jα }) does not alter (on average) the estimation of the orientation n = r/r of the signal state, the optimal estimation in the sense of average fidelity of (a priori) isotropically distributed mixed states breaks into two independent estimations: that of the purity r and that of the orientation n in the Bloch sphere. Notice finally that after this measurement, the rest of the protocol, which involves the POVM (3.21) for a fixed j (or any version of it with a finite number of outcomes), is identical to the optimal protocol for estimating a pure state | n given 2j identical copies of it [2].

B. 2D states
In the situation we are about to consider, V χj , defined by (3.5), still determines the maximum fidelity through Eq. (2.9), but dρ is Since r is a 2-dimensional vector, we can use a complex notation and write r → re iθ . In this notation V χj and R χj also become complex numbers. More specifically, where we have raised the outcome labels χ and j in O χj mm [or in ρ j mm ′ , Eq. (2.17)] to avoid a confusing proliferation of subindexes; the latter will label matrix elements, e.g., where we have used that ρ j m m+1 ≥ 0 for all r. The equality in (3.28) is attained by choosing the phase of O χj m+1 m to be independent of m. The positivity of O χj implies that By choosing |O χj m+1 m | to take its maximum value in (3.29) we ensure that | V χj | will also be maximal. So far, the optimization of V 0 χj and | V χj | can be carried out independently of one another, since the choices we have to make in order to saturate the bounds in (3.28) and (3.29) do not affect V 0 χj . However, we will have to check that they are compatible with the POVM condition χ O χj = 1 1 j . We will verify this by giving an explicit POVM that meets all the above conditions.
We now replace O χj by its covariant versionÕ χjφ , defined in (D3) -in Appendix D we show that this change does not affect the average fidelity-and take the seed (positive) operator Ω χj in (D1) to be given by Ω χj = |u χj u χj | (i.e., to be rank one), where The components u χj m are taken to be real and must satify It is important to realize that the vanishing of the offdiagonal elements in , we see that the maximum fidelity is given by the maximum value of where u χj m is constrained by (3.31) and α j m and β j m can be read off from (3.27) and (3.28) respectively: With no loss of generality we can take the index χ in (3.34) to be integer and its maximum value to be less or equal than the number of distinct values of α j m in (3.35). The symmetry relation d . ] stands for integer part. With all the above, maximizing ∆ 2D , which can be done for each j independently, becomes a straightforward task.
The results of the 3D case may lead us to believe that the optimal POVM will be independent of the prior w(r). The inspection of the low N cases gives further support to this belief. For j ≤ 5/2 (N ≤ 5) one can show that the optimal POVM is given by for any prior w(r), where we have dropped the index χ because it only takes one value here. 1 However, one can check that for j ≥ 3 the choice (3.37) is not optimal for some priors. Take for instance N = 6 and consider a prior of the form w(r) = (2r/δ 2 )Θ(δ − r), where Θ(x) is the step function [i.e., Θ(x) = 1 for x ≥ 0 and Θ(x) = 0 otherwise] and δ is a positive number. If δ is sufficiently small, one can Taylor-expand ∆ 3 about δ = 0 and easily obtain the optimal solution at leading order, which does not turn out to be of the form (3.37). A straightforward computation yields (∆ opt In spite of this unexpected dependence on the prior in the 2D case, there are, however, two features in the example above that are completely general: (a) the difference ∆ opt only for priors that are very peaked about r = 0. There is a further, very important property: the POVM defined by (3.37) is asymptotically optimal (the proof is given in Appendix H). Hence, for practical purposes, the best one can do is to stick to the choice (3.37), for all j and m, regardless the prior knowledge one may have of ρ. Though this choice does not guarantee optimality for small N , it does guarantee that the corresponding fidelity will differ from the maximum one by a tiny amount (typically less than only one part in a thousand) and, furthermore, that this difference will decrease to zero as N → ∞.
The asymptotically optimal choice (3.37) amounts to replacing O χj byÕ where Ω j = |u j u j |, |u j = m |jm , and [hereafter we drop the superindex "Eq. (3.37)" in ∆, ∆ j , etc.] and the analogy with (3.24) is apparent. We next recall (3.35), which involves tr ρ N j . Since the trace is invariant under rotations, v 0 j can be straightforwardly computed using (2.12) and (2.13). No such simplification exists for v x j , as far as we are aware. Proceeding this way we have where the coefficients c j m are given by as can be read off from (2.17). The sum over m in v 0 j can be easily performed, since it is just the sum of a geometric series, and yields v 0 The sum over m in v x j , however, is non trivial because of the coefficients c j m and no simple closed formula can be found but in the asymptotic limit N → ∞.

IV. ASYMPTOTICS: BAYESIAN APPROACH
In this section we calculate the asymptotic (large N ) expressions of the fidelities obtained in the previous sections using the Bayesian approach. For 2D they are summarized in (3.39), with the definitions (3.41), (3.42) and the relation (3.43). For 3D the maximum fidelity is given by (3.24), which involves the definitions (3.18) and (3.19). We here present a detailed computation only for 2D. The 3D case can be computed in a similar way and we just point out the main differences with 2D. For simplicity we consider an even number of copies N = 2n, thus J = n.
We start by noticing that the coefficients c j m , defined in Eq. (3.42), satisfy c j −m = −c j m (which implies c j 0 = 0) and, hence, We further note that the dominant contribution to the sum in v x j comes from the region where m is close to its maximum value j. We can thus replace c j m by the first terms of its "Taylor expansion" about m = j. It turns out that only the first two terms, c j m ≈ a j + b j (m − j), contribute at the order we are interested in. The coefficients a j and b j are computed in Appendix E. After substituting Eq. (E6) in (4.1) the sum over m gives: where we have dropped terms that fall off exponentially as n goes to infinity. It is convenient to combine v 0 j and v x j with the binomial in n j [see Eq. (2.11)] and definev 0 With this, Eq. (3.39) becomes Our goal is to compute the asymptotic behaviour of the above sum. We do so by first computing the leading order contribution: lim n→∞ ∆. We, of course, expect this to be unity, as the optimal guess must certainly lead to a perfect estimation given infinitely many copies. The calculation thus provides a consistency check of the approach and, moreover, the leading order expression ofv 0 j andv x j , which will be later used to compute the next-toleading order contribution.
At leading order in 1/n, we are entitled to use the well known result which holds for large n. In our case k = n − j and q = (1−r)/2. Furthermore, we can approximate the gaussian in (4.5) by the Dirac delta function δ(k − 2nq) = δ(nr − j) = δ(r − j/n)/n. After a straightforward calculation we end up with where s = j/n. Recalling the derivation of Eq. (3.39), we see that the optimal guess for the purity only depends on j and is given by in full analogy with (3.25). [The optimal guess for θ is given by φ, Eq. (3.38).] One readily obtains as expected. Similarly, it also follows from (4.6) that and, as it should be, lim N →∞ F = 1 for any prior.
We are now ready to compute the fidelity to next-toleading order. The calculation can be greatly simplified by noticing that for all ξ j such that 0 < ξ j < 1 [this is, in reverse, the same argument that took us from (2.7) to (2.9)]. The bound is saturated iff for all j, namely, iff ξ j = R j . With the leading order choice ξ j = j/n, Eq. (4.11) provides a tight bound at order O(1/n). At next-to-leading order we thus have where we have "linearized" the square root in (4.4), hence overcoming in a very simple way the most demanding part of the calculation. We can now use the techniques in Appendix F to evaluate the asymptotic value of this sum. We obtain which implies independently of the prior w(r). This result agrees with the bound derived from the pointwise approach in the next section.
The very same approach we have outlined can be applied to 3D states, we just have to replace v x j by v z j [see Sec. III A and Eqs. (C1), (C2) and (C3)]. To next to leading order we have (see Appendix F for details) Recalling that n = N/2, the asymptotic fidelity reads where r stands for the mean purity over its prior distribution, namely r ≡

V. ASYMPTOTICS: POINTWISE APPROACH
In the Bayesian approach, described in the previous sections, both the measurement strategy and the estimator (or guess) -i.e., the estimation scheme-are so chosen as to minimize the average fidelity with respect to a given prior distribution for any N . In contrast, in the so called pointwise approach, to which this section is devoted, one's goal is to optimize the performance of a scheme at a fixed point, θ 0 , in parameter space (In this section we will denote the parameters that specify the states by θ and the guesses byθ, as is standard in statistics).
The aim of this section is to present a bound on the quadratic cost, the so called quantum Cramér-Rao bound (QCRB), and its relation to the fidelity. The QCRB is a matrix inequality which is in general non-attainable. However there is a related bound that one can expect to be saturated asymptotically: the Holevo bound. A scheme that attains this bound is asymptotically optimal from the pointwise perspective.
The pointwise approach relies on the fact that for large N only quadratic cost functions become relevant. By appropriate algebraic manipulations and averaging over the prior distribution one can compare this approach with the Bayesian one in the asymptotic limit. It is proved rigorously in [19] that the averaged Holevo bound leads to an asymptotic upper bound to the globally optimal fidelity for "smooth" qubit estimation problems, and for "smooth" pure state estimation problems. (We have a lucky coincidence for qubits, and for pure states, that fidelity can be expressed as a quadratic form in the estimation error of certain parameters of the state.) One can expect this bound to be asymptotically valid in general, but no rigorous proof has been given yet.
As to whether or not the averaged Holevo bound is asymptotically saturated: there exist very good heuristic arguments that this should be true, but no rigorous proof. (Unpublished work of M. Hayashi: for large N the estimation problem can be approximated, around a point obtained by a preliminary rough estimate, by a Gaussian state estimation problem, for which the Holevo bound is attained by an appropriate generalized heterodyne measurement).
In Sec. III A we derived the optimal global scheme for 3D states and showed that it is the same for any isotropic prior distribution. From the previous considerations we expect it also to be asymptotically optimal in the pointwise sense. We will show that this is indeed the case, since the optimal fidelity does coincide asymptotically with the averaged Holevo bound.
For 2D states the situation is more complex. Recall that the scheme defined by (3.37) is not optimal for arbitrary N and general isotropic priors. Nevertheless, Eq. (4.15) also coincides with the averaged Holevo bound. This comes close to a proof of the asymptotic optimality of the scheme. A rigorous proof (see Appendix H) can be derived from the van Trees inequality [20] (the same inequality is used to get the more general results in [19]). Thus our approximate solution (3.37) is asymptotically optimal both from the global and from the pointwise points of view.
Both the 3D and the 2D cases confirm the conjectures that the averaged Holevo bound is a sharp asymptotic bound for fidelity, and that the global optimal scheme is also asymptotically optimal in the pointwise sense. Global asymptotic optimality does not depend on the prior or on non-local features of the figure-of-merit.
Before stating the main results, we need to introduce a bit of notation. Let ρ be a density matrix parametrized by θ ≡ (θ 1 , θ 2 , . . . , θ p ) ∈ Θ ⊂ R p , where p is the number of parameters. 2 Just as in the previous sections, let us assume we perform a generalized measurement O on an arbitrary state ρ(θ). Recall that such measurement is represented by a POVM O = {O χ }, where χ ∈ Ω labels the various outcomes. Letθ χ be the estimate (or guess) of θ based on the outcome χ, i.e.,θ is a mapping from the outcome set Ω to the parameter space Θ: A natural way of quantifying the performance of an estimatorθ and a measurement O at a point θ 0 is provided by the mean square error matrix (MSE) defined by the matrix elements where the dependence on O is understood to simplify the notation and, naturally, p(χ|θ 0 ) = tr [ρ(θ 0 )O χ ]. In the remaining sections of the paper E θ0 [f ] stands for the expectation value of f with respect to the probability distribution p(χ|θ 0 ). An estimator is said to be locally unbiased (LU) at θ 0 if where ∂ α is shorthand for ∂/∂θ α . Intuitively, these conditions mean that, on average, the estimator is close to the truth in a small neighborhood of θ 0 . When these conditions are satisfied for all possible values of θ 0 , the estimator is said to be uniformly unbiased, or, simply, unbiased. LU estimators play a fundamental role in the pointwise approach. The Fisher information matrix (FI) is defined as Note that the FI depends on a specific measurement O, through the probabilities p(χ|θ).
With the above few definitions we can already give a first important result: The Cramér-Rao bound (CRB). It states that the MSE of an estimatorθ LU at θ 0 is lower bounded by the inverse of the FI, namely, In spite of its fundamental character, the CRB has the drawback that the bound it provides refers to a particular measurement, not necessarily optimal. To go around this difficulty, some new definitions are required The symmetric logarithmic derivative (SLD), denoted by λ α (θ) (recall that α = 1, 2, . . . , p), is defined as the self-adjoint matrix that satisfies The SLDs for the 2D and 3D cases (2D and 3D models in pointwise terminology) are given in Appendix G. With this we can now define the quantum Fisher information matrix (QFI) as H αβ (θ) = Re tr ρ(θ)λ α (θ)λ β (θ). (5.7) E.g., for the two models studied in this paper the QFIs are The second important result of this section, due to Braunstein and Caves [21], states that for a given model all FIs are bounded from above by the QFI, i.e., from which it immediately follows the QCRB: Although these bounds are measurement-independent -they depend only on the signal states and the geometric properties of the space they belong to-they have the drawback of not being always attainable. We have seen above that H(θ 0 ) provides information on how small the variance of an estimator can be at θ 0 . There is still another remarkable property of the QFI that we will need below: its direct relation to the fidelity [14]. Indeed, from its definition [see Eq. (2. 3)], H αβ (θ 0 )δθ α δθ β + . . . , (5.12) where the components of δθ are assumed to be small (neighboring states). Given a scheme, characterized by ({O χ },θ), the average of the fidelity over all possible outcomes is Our aim is, therefore, to minimize the cost tr H(θ 0 )V (θ 0 ,θ). (5.14) An optimal measurement, O opt , is thus the one that minimizes (5.14). The formalism and results presented so far are completely general and apply to any model, i.e., to any family of states ρ(θ). We now need to introduce the so called N -copy model. It is defined by the set of density matrices ρ N (θ) of the form The "original" family, ρ(θ), is sometimes referred to as the single-copy quantum model. Naturally, we can talk about the variance or MSE of an estimation of the Ncopy model, which we denote by V N (θ 0 ,θ). It is not hard to convince oneself that the cost Eq. (5.14) of the optimal scheme necessarily scales as 1/N , for large enough N . 3 It is well-known in classical statistics [22] that under some regularity conditions the maximum likelihood (ML) estimator is asymptotically unbiased at θ 0 and its MSE is equal to I N (θ 0 ) −1 , i.e., the ML estimator achieves the CRB asymptotically. It follows that for an optimal measurement tr H(θ 0 )I N (θ 0 ) −1 provides an attainable bound to the cost and it will scale as 1/N asymptotically. This lower bound on (5.14) can be expressed as (5.16) whereĪ N = I N /N is called the normalized FI. Likewise, for the asymptotic fidelity we have which means that our optimization problem amounts to finding a measurement O opt that minimizes tr H(θ 0 )Ī N (θ 0 ) −1 . We next present a powerful measurement-independent bound to this expression; the so called Holevo bound. Let G be a positive semi-definite matrix and where the minimization is over all pairs (O,θ) of measurements on ρ N (θ) and estimators for which the latter is LU at θ 0 (the unbiasedness of an estimator depends on the measurement through its outcome probability distribution). Eq. (5.18) is relevant to the problem we are dealing with because its right hand side can be shown to give the 1/N term in (5.16) and (5.17) if G = H(θ 0 ). In Ref.
[1] Holevo proved the following bound: In this expression X = (X 1 , X 2 , . . . , X p ) are hermitian matrices satisfying the following relations The minimization in (5.20) is over the set Ξ θ 0 of all such X. Finally, Z[X] is the p × p matrix whose elements are given by as previously mentioned in this section. It is important to point out here that practical use of Hayashi's construction would require a two-step measurement in order to saturate the bound. This is necessary because the optimal measurement and LU estimator at θ 0 depend themselves on θ 0 , which we do not know beforehand. To overcome this difficulty, one takes an asymptotically vanishing fraction of copies, say √ N , and makes an initial estimate of the parameterθ ini . Then, on the remaining copies one performs the measurement that is optimal at θ ini . Therefore, (5.17) and (5.24) lead us to expect that the optimal asymptotic fidelity is given by We next apply these results to the 3D and 2D models.

Holevo bound for the 3D case
In this case p = 3 and it is not hard to show (see Appendix G) that there is only one "vector" of matrices X = (X r , X θ , X φ ) in Ξ θ and no minimization is thus required in (5.20). The Holevo bound is straightforwardly computed to be Furthermore, we expect this result to hold regardless on whether the ML estimator or the optimal guess is used.
This implies that for a "well behaved" prior, one should have (4.17) by simply averaging (5.27), and we re-obtain the result of the the preceding section, which was computed using the Bayesian approach, with much less effort. Eq. (5.27) was also obtained by Matsumoto and Hayashi [12] with an estimation strategy similar to the one developed in Section III A.

Holevo bound for the 2D case
In the 2D model the SLDs satisfy It is not difficult to check that in this situation the QCRB is asymptotically attainable, 4 i.e., from which (4.15) follows for "well behaved" priors. This strongly supports the claim that the 2D measurement scheme defined by Eq. (3.37) is indeed asymptotically optimal. The Appendix H contains the rigorous proof.

VI. CONCLUSIONS
We have presented a detailed analysis of the optimal estimation of qubit mixed states given a number N of identical copies. Our results apply to arbitrary N , finite or asymptotically large.
For general states (3D) we have obtained that the structure of the optimal measurement is based on the decomposition of the signal states in irreducible blocks under the action of the symmetric group. The scheme is essentially unique, valid for any isotropic prior distribution and any number of copies. This optimal scheme has the nice property that it can be regarded as two independent protocols performed sequentially: that for estimating the purity r of the state and that for estimating its orientation n in the Bloch sphere. It turns out that the estimation of the purity only exploits rotationally invariant properties of the signal states, and a measurement of the Casimir operator J 2 = j(j +1) α 1 1 jα is optimal. In other words, the estimate of r only depends on j, which characterizes the SU(2) invariant subspaces. This should not come as a surprise since the purity itself is rotationally invariant and so are the priors considered here. The estimation of the orientation is formally equivalent to pure state estimation with 2j copies. As an illustration of our procedure, we have obtained closed expressions of the fidelity for the particularly important Bures prior. Results for other priors can be easily obtained with the techniques presented here.
In 2D, if one wants to do precisely optimal estimation for any N , there is a subtle interplay between the estimation of the purity and the estimation of the phase and they are no longer independent, although they are asymptotically so. Also contrasting with 3D is that the structure of the optimal POVM depends on the prior. The roots of this unconventional behavior lie in the different group structure of 2D states. Here the relevant group is U(1) [instead of SU (2)] and j is not the only invariant; the magnetic number m is also invariant under U(1). Actually, the interplay purity-phase can be traced back to this symmetry property. In spite of these difficulties, we have reduced the problem of obtaining the optimal POVM for any isotropic prior to a rather trivial maximization problem [recall Eq. (3.34)]. We have also obtained a prior independent POVM that is indistinguishable from the optimal one for any practical purposes. Furthermore, it separates purity and phase estimation exactly for all N and is asymptotically optimal.
The asymptotic behaviour of the estimation procedure has also been a central issue of our work. The asymptotic fidelity in 3D has the simple form F = 1−(3+2 r )/(4N ), where r is the mean purity with respect to the prior. This result is proved here for isotropic priors within our Bayesian approach. It is worth emphasizing that so far the asymptotic expression was only known for the particular case of the Bures prior [8]. In 2D, the asymptotic fidelity computed with the fixed POVM described above is simply F = 1 − 1/(2N ), independently of the prior.
We have studied the asymptotic behavior also from the pointwise approach, which is far more common among statisticians. The main advantage of the pointwise approach over the Bayesian one is that it provides bounds on the asymptotic mean square error (as well as on any other quadratic loss function) that can be easily computed. These bounds correspond, by second order expansion of the figure-of-merit, to bounds on the average fidelity which can be shown to be rigorous in many cases ( [19]), including those studied in this paper. The drawback of the approach is that though one can heuristically expect these bounds to be asymptotically sharp, and one can propose two-stage measurement schemes which can be hoped to do the job, a lot of hard work is needed in each case to prove that they can be achieved. In contrast with the 3D case where all the results we have worked out from the Bayesian approach are rigorous, the optimality in the asymptotic regime of the 2D estimation scheme defined by (3.37) or (3.38) required some further work.
Here we used the pointwise approach to fill the gap. The application of the van Trees inequality [20] to 2D in Ap-pendix H yields the asymptotic bound on the fidelity in a particularly elegant and straightforward way. In turn, this bound provides the optimality proof.
Altogether, the fact that the results obtained from the pointwise approach coincide with those derived from the Bayesian framework give further strong support for the heuristic principle that the averaged lower bound from the pointwise approach is an asymptotically sharp lower bound for the global approach; and moreover that the chosen prior distribution and to a lesser extent, figure-of-merit, has asymptotically little impact on the behaviour of the solution.
There are two extensions of our work that can be readily addressed. Here, we have considered the full estimation of a qubit mixed state, however for some applications only partial knowledge of the state, such as its purity or its orientation, may be required. The techniques developed in this work can be easily adapted to these situations (see [18] and [23]). A second line of work concerns the use of more realistic measurements, in particular those that can be implemented with current technology. In this work we have considered the most general measurements allowed by Quantum Mechanics. They yield the maximum theoretical accuracy that can possibly be achieved, and thus provide a bound (and a measuring rod) for the accuracy of any other estimation scheme. However, they involve joint operations on the whole sample of states that in general are difficult to implement in a laboratory. It is thus of great practical relevance to study schemes based on local von Neumann measurements. Preliminary results, were presented in [8]. There, it was found that, for some tomographic schemes, the rate at which the fidelity approaches unity for a Bures prior distribution is 1 − F ∼ 1/N 3/4 , i.e., there is a qualitative difference with the optimal measurements. Present work in progress suggests that by using classical communication the precision rate can be similar to the optimal collective scheme 1 − F ∼ 1/N , but the coefficient of the 1/N term is strictly larger than the optimal one, and corresponds to the result from the pointwise approach obtained in [3].
One may use the symmetric group S N to write ρ ⊗N in the block-diagonal form (2.10), much in the same way as it is used to obtain the SU(2) Clebsch-Gordan decomposition (the multiplicity, n j , is computed in Appendix B). However, at variance with the SU(2) case, where all Young frames have a single row, we here must also consider those with two rows, because (instead of unity). Hence, each two-box column of a frame contributes a multiplicative factor det ρ.
With this observation, one can easily obtain the expression of the blocks ρ N j as follows. A generic Young frame with N boxes has the shape (A3) Each of the N/2 − j double columns gives a factor det ρ.
The remaining 2j single columns correspond to a fully symmetric tensor on which SU(2) acts irreducibly. In the basis of the irreducible subspace of the representation j, this tensor can be written as the matrix which we denote by ρ j . Hence We now note that for r = r z the matrices ρ ⊗N , ρ N j and ρ j , are all them diagonal and can thus be obtained without much effort. The result is For arbitrary r covariance implies Notice that, in spite of what the notation might suggest, the matrices ρ j are not proper density matrices, as tr ρ j = 1. Using Young tableaux techniques, there is a simple way to compute the multiplicity n j , (2.11), with which the representation j shows up in the Clebsch-Gordan decomposition of ( 1 2 ) ⊗N (this tensor product is denoted by ⊗N in the present context).
The Young frame in (A3) can be denoted by λ = [λ 1 , λ 2 ] = [N/2 + j, N/2 − j] (this is a standard notation where λ k is the number of boxes in the k-th row of the frame). This very same frame (A3) is equivalent to a single row of 2j boxes, i.e., to [2j], which denotes the representation j of SU (2).
The recipe for computing SU(2) Clebsch-Gordan decompositions [24] applied to ⊗N amounts to the following. First label N boxes each with an integer number from 1 to N . Then, starting with box number one and proceeding sequentially, build (and keep account of) all possible Young tableaux such that (i) they have at most two rows and (ii) the full sequence of integers formed by reading right to left in the first row and then in the second is admissible. 5 The number of occurrences of (A3) is precisely n j . But the very same recipe gives us all standard Young tableaux 6 of shape λ = [N/2 + j, N/2 − j]. Hence n j equals the number, f λ , of such tableaux.
Recalling the Frobenius determinantal formula [25], we get This determinant is readily seen to give (2.11).

APPENDIX C: CLOSED EXPRESSION OF THE FIDELITY USING A BURES PRIOR IN 3D
The explicit expressions of the coefficients v 0 j , v z j [Eqs. (3.19) and (3.18) with To obtain these expressions we have recalled (2.12) and (2.13) and defined w(r) = w(−r) for −1 ≤ r < 0 to extend the r-integration to the interval [−1, 1]. Consider now the Bures prior [14], which is commonly regarded as the natural uniform distribution in the Bloch sphere, since it follows from the metric induced by the fidelity [9,10]. It is given by which implies w(r) = (4/π)r 2 (1 − r 2 ) −1/2 . In this case the integration in (C1) and (C3) can be performed analytically. For simplicity, we will consider an even number of copies N = 2n (J = n). By making extensive use of is the standard Euler Beta function, we obtain v 0 j = 8d j π(2n + 3) B(n − j + 1, n + j + 2). Similarly, and .
Putting the various pieces together we finally obtain the closed expression: 2(2j + 1) 2 (2n + 3)(2n + 2)(2n + 1) × 1 + j n + j + 1 Γ(n − j + 1 2 )Γ(n + j + 3 2 ) Γ(n − j + 1)Γ(n + j + 1) For the sake of completeness, in this appendix we give a simple proof specialized to the 2D case of a more general result concerning the optimality of covariant (continuous) POVMs [1]. More precisely, we wish to prove that for any given POVM, {O χ }, there is always a covariant (continuous) one, with elements which gives the same average fidelity for a suitable positive operator Ω χ . The proof goes as follows.
In the 2D case the average fidelity can be written as (in this section the integration limits 0 and 2π are understood) where θ χ (θ) is the angle between R χ ( r) and the x-axis, and we denote the fidelity by f (θ χ − θ, R χ ) to emphasize the fact that in 2D it is a function of the difference of these two angles. Note also that we drop the explicit dependence on r which does not play any role in the proof. Thus, e.g., we denote the mixed state ρ( r) simply as ρ(θ). Proving our statement amounts to proving that the POVM with elements and associated guess given bỹ gives the same fidelity as {O χ }. Note that (D3) defines Ω χ in (D1) through In formulae we wish to prove that F =F , wherẽ is the fidelity we obtain with {Õ χφ }. We also have to prove that {Õ χφ } in (D3) is indeed a POVM, namely, that Let us start by proving (D6). We simply change variables φ → φ ′ = φ − θ χ and use the invariance of the U(1) Haar measure, which in this case is the trivial identity 2π 0 dφ g(φ) = 2π 0 dφ g(φ + α) satisfied by any periodic function g of period 2π. We have We use the same logic to prove that F =F : We now use that U † (φ)ρ(θ)U (φ) = ρ(θ − φ) and make the change of variable θ → θ − φ to obtainF = F . If R χ = R for all χ (this is the case if the estimation of r is entirely based on j, as in the last part of Section III B), we can replace the POVM elementsÕ χφ bỹ This is equivalent tõ where the positive operator Ω can be expressed in terms of Ω χ in (D1) simply as Ω = χ Ω χ . The proof that achieves the same fidelity is straightforward and it amounts to pulling the sum over χ into or out of the trace in Eqs. (D5) and (D8), which we are entitled to do because we are assuming that R χ is now independent of χ.
Using the results in Ref. [17], it is easy to show that for any given covariant (continuous) POVM with elements given by (D1) there is always a POVM with a finite number of elementsŌ φa = U (φ a )ΩU † (φ a ), a = 0, 1, 2, . . . M − 1, which achieves the same fidelity for a suitably large M . The angles φ a can be chosen to be φ a = 2πa/M , a = 0, 1, 2, . . . M − 1.

APPENDIX E: COMPUTATION OF THE COEFFICIENTS aj AND bj
In this Appendix we give an approximation to c j m , defined in Eq. (3.42), of the form c j m ≈ a j + b j (m − j) valid for m ≈ j large enough.
Recalling the Wigner formula we obtain We note that the two coefficients c j j and c j j−1 are binomial sums modulated by smooth functions of m in a neighborhood of m = 0. More precisely, where ϕ k (m), which can be read off from (E2) for k = j, j − 1, can be Taylor-expanded at m = 0. For large j this expansion is Here the power counting is done by noticing that m is order √ j, since the sum is O(j q/2 ) for q even and vanishes for q odd, as is well known. In particular, we have S 0 = 1, S 2 = j/2, S 4 = j(3j − 1)/4. With all this information we obtain c j j = 1 − 1/(4j), c j j−1 = 1 − 3/(4j), and finally have

APPENDIX F: EXPLICIT COMPUTATION OF THE ASYMPTOTIC FIDELITY
Here we present with some detail the procedure we have used to evaluate the sum of (4.13) in the large N = 2n limit. We first focus on 2D states and later comment on the main differences with 3D.
In the two cases, we write n j as the right hand side of the identity We next multiply the powers of (1 ± r)/2 that are explicitly given in this equation by the first binomial. Likewise, we multiply those denoted by (r → −r) by the second binomial. In the resulting expressions, we next change the summation indexes according to n − j = k and n + j + 1 = k, respectively, and do similar changes in the remaining crossed terms. After some algebra, we have where B k (r) is defined by Since the coefficients B k (r) are the terms of a binomial series, for large n only those for which k ≈ n(1 − r) ≤ n (or equivalently 2n − k ≥ n) give a significant contribution to the fidelity, whereas the rest fall off exponentially with n. This enables us to expand the factor (k − 1)(2n − k + 1) in Φ k−1 (r) as a power series in 1/k and 1/(2n − k) and obtain the relation which we use in the second sum of (F3). We further define Ψ k (r) = Φ k−1 (r) − Φ k (r) + O(1/n). It satisfies Ψ k (r) = −Ψ 2n−k (−r), as can be read off from (F6). The leading contributions come from the terms that contain Φ k (r), and the corresponding term in [r → −r]. They combine into a single sum from k = 0 to k = 2n. The rest of the terms [those proportional to Ψ k (r) and Ψ k (−r)] are subleading and can be simplified using the change of indexes k → 2n − k. The result can be cast as ∆ 2D = 1 n 1 0 dr w(r) r 1 + r 2 2n k=0 B k (r)Φ k (r) We readily see that the first sum (as well as the corresponding one obtained by the substitution r → −r) is a binomial sum modulated by the function Φ k (r), analogous to (E3) in Appendix E, and can be computed along the same line. This sum is peaked at k ≈ n(1 − r), as we have already mentioned, which suggests expanding Φ k (r) in powers of k − n(1 − r). More precisely, one can check that showing that one minus the fidelity is the squared L 2 cost function for estimating ψ. Taking the two states close to one another, and comparing with (5.12) shows that where ψ ′ (θ) denotes the 4×2 matrix of partial derivatives of ψ with respect to components of θ and H is the QFI. LetĪ N = I N /N denote the normalized FI for θ based on an arbitrary collective measurement on the N copies, and letθ denote an arbitrary estimator of θ based on that measurement. By E w we denote averaging over θ with respect to a prior probability density w over the equatorial plane. Then the van Trees inequality [20] states that, for any given matrix function C(θ) of size dim(ψ) × dim(θ), and under certain smoothness conditions on the probability distribution of the outcome of the measurements and on the prior w, where by (wC) ′ we denote the column vector of the same length as ψ, with row elements β ∂ β [w(θ)C i β (θ)]. By the Helstrom information inequality (5.9) we may boundĪ N in the denominator by H (of the single-copy model). Without the "1/N " term in the denominator, the optimal choice of C would be C = ψ ′ H −1 . Making this choice anyway gives Hence, provided the second term in the denominator is finite, by further substituting ψ ′⊤ ψ ′ = 1 4 H and letting N converge to infinity, we obtain The van Trees inequality requires some modest smoothness of the probability density of the measurement outcomes as function of θ, which are satisfied in our case since the density matrix ρ ⊗N (θ) is a smooth function of θ. It requires smoothness of the prior density w and also that this density converges to zero at the boundary of its support. This last property does not hold for the priors in which we are interested. However, for a given prior w and for given ǫ > 0 one can construct a prior w ǫ which is zero outside a circle of radius strictly smaller than 1, which converges smoothly to zero at the boundary of its support, and which is everywhere smaller than (1 + ǫ)w. The modification of w can simultaneously be done ensuring that the second term in the denominator of (H4) is finite. Since we can first derive (H5) with w replaced by w ǫ , then let ǫ → 0, resulting in (H5) with the original w in place.