Multi-copy programmable discrimination of general qubit states

Quantum state discrimination is a fundamental primitive in quantum statistics where one has to correctly identify the state of a system that is in one of two possible known states. A programmable discrimination machine performs this task when the pair of possible states is not a priori known, but instead the two possible states are provided through two respective program ports. We study optimal programmable discrimination machines for general qubit states when several copies of states are available in the data or program ports. Two scenarios are considered: one in which the purity of the possible states is a priori known, and the fully universal one where the machine operates over generic mixed states of unknown purity. We find analytical results for both, the unambiguous and minimum error, discrimination strategies. This allows us to calculate the asymptotic performance of programmable discrimination machines when a large number of copies is provided, and to recover the standard state discrimination and state comparison values as different limiting cases.


I. INTRODUCTION
Discrimination between given hypotheses is one of the most basic tasks in our everyday lives. Very often we are confronted with the necessity of having to identify an option between some possible choices based on some acquired evidence. In the quantum setting the discrimination problem consists of identifying one of two possible states given a number of identical copies available for measurement. This task encompasses a plethora of nontrivial theoretical and experimental implications. In the usual setting the a priori states are known (i.e., the classical information characterizing the possible states is provided and the discrimination protocol is tailored for this specific information). One usually considers two types of approaches: unambiguous [1] and minimum error [2] discrimination. An unambiguous protocol is one where the identification of the state is error free. Of course, this is only possible stochastically, that is, unless the states are orthogonal, the protocol must give an inconclusive answer (the "I do not know" outcome) with a nonvanishing probability. In the minimum error approach, the protocol always yields a definite answer, which may be wrong some of the times. An optimal protocol is one which minimizes the inconclusive or the error probability. It may also be possible to go continuously from one case to the other by considering margins of error probabilities [3]. In spite of being such a fundamental problem, only very recently a closed expression for the asymptotic error probability has been obtained (see [4][5][6] and references therein), the quantum Chernoff bound, from which metric distances and state densities [7] can be derived.
Very much in the spirit of universal computers, it is interesting to consider discrimination devices that are not specialized in a specific discrimination instance but can discriminate between arbitrary pairs of states [8,9]. In these, the set of possible states enters the device as "programs," that is, the classical description of the states is not provided beforehand, rather the information is incorporated in a quantum way (this can also be viewed as an instance of relative information [10]). These devices have program ports that are loaded with the program states and a data port that is loaded with the unknown input state one wishes to identify. The device will identify the state of the data port as being one of the states fed in the program ports, but this identification will, in general, be erroneous with a probability that decreases with the number of copies of the states entering the ports. One can also regard these devices as learning machines [11], where the device is instructed through the program ports about different states, and based on this knowledge the machine associates the state in the data port with one of the states belonging to the training set. Increasing the number of copies of states at the program and data ports, of course, increases the chances of correct identification.
It is particularly relevant to understand how the probability of error scales with an increasing number of copies and what are the corresponding error rates. The value of this rate is one of the most relevant parameters assessing the performance of the device. We will consider the discrimination of two general qubit states, although most of our results can be generalized to higher-dimensional systems (see [12,13] for a single-copy continuous variable setting). For simplicity we will assume that the prior occurrence probability of each state is identical and compute the unambiguous and minimum error rates for optimal programmable devices.
We first study the performance of such devices for pure states. We compute the error probabilities for any number of pure qubit states at the input ports. Some of the results are already available in the literature [9,[14][15][16][17][18], but the way we formalize the problem here is crucial to treating the more general mixed state case. In addition, we obtain analytical expressions that enable us to present the results and study limiting cases in a unified way. In particular, when the program ports are loaded with an infinitely large number of copies of the states we recover the usual state discrimination problem [2] since it is clear that then one has the classical information determining the states entering the program ports. On the other hand, when the number of copies at the data port is infinitely large, while the number of copies at the program ports are kept finite, we recover the state comparison problem [19,20].
We extend the previous pure state study to the case of mixed input states. In this scenario we only compute the minimum error probability, as no unambiguous answers can be given if the states have the same support [21]. The performance of the device for a given purity of the input states allows us to quantify how the discrimination power is degraded in the presence of noise. The expressions here are much more involved, however, one can still exploit the permutation symmetry of the input states to write the problem in a block-diagonal form [22,23]. We then obtain closed expressions for the probability of error that can be computed analytically for a small number of copies and numerically evaluated for a fairly large number of copies. We are also able to obtain analytical expressions for some asymptotic rates. Again, the leading term, as in the pure state case, is seen to coincide with the average minimum error for known states.
We also analyze the fully universal discrimination machine (i.e., a device that works optimally for completely unknown input states). In this case one has to assume a uniform distribution for the purity. In contrast to the pure state distribution, there is no unique choice [24], and different reasonable assumptions lead to different uniform priors. Here we consider the hard sphere, Bures, and Chernoff priors.
The paper is organized as follows. In the next section we obtain the error probabilities for pure states when each program port is fed with n copies of each state and there are m copies of the unknown state entering the data port. In Sec. III we study the asymptotic rates in several scenarios. In Sec. IV we analyze the performance of these devices when the ports are loaded with copies of states of known purity and obtain some interesting limiting cases in Sec. V. We finally obtain the error rates for the fully universal programmable machine. Some brief conclusions follow and we end up with two technical appendices.

II. PURE STATES
Let us start by fixing the notation and conventions used throughout this paper. We label the two program ports by A and C. They will be loaded with states |ψ 1 and |ψ 2 , respectively. The data port B is the middle one and will be loaded with the states we wish to identify. We also use the short-hand notation [ψ] to denote |ψ ψ| and similarly [ψφ · · ·] = [ψ] ⊗ [φ] ⊗ · · · = |ψ ψ| ⊗ |φ φ| ⊗ · · ·. We may also omit the subscripts A, B, and C when no confusion arises. We assume that the program ports are fed with n copies of each state and the data port with m copies of the unknown state. This is a rather general case for which closed expressions of the error probabilities can be given. The case with arbitrary n A , n B , and n C copies at each port is discussed in Appendix A. The expressions are more involved but the techniques are a straightforward extension of the ones presented here.
When the state at the data port is |ψ 1 ⊗m or |ψ 2 ⊗m , the effective states entering the machine are given by the averages respectively. The integrals can be easily computed using the Schur lemma dφ[φ] X = 1 d X 1 X , where d X is the dimension of the Hilbert space spanned by {|φ } and 1 X is the projector onto this space. Hence where 1 XY is the projector onto the completely symmetric subspace of H X ⊗ H Y and d XY = tr1 XY is its dimension. For qubits we have d A = d C = n + 1 and d AB = d BC = n + m + 1.
The structure of the states (2) suggests the use of the angular momentum basis |j A ,j B (j AB ),j C ; J M for σ 1 and |j A ,j B ,j C (j BC ); J M for σ 2 . The quantum numbers j AB = j A + j B and j BC = j B + j C recall the way the three spins are coupled to give the total angular momentum J . Here the angular momenta have a fixed value determined by the number of copies at the ports j A = j C = n/2, j B = m/2. So, we can very much ease the notation by only writing explicitly the labels j AB and j BC . We would like to stress, however, that, in general, one needs to keep track of all the quantum numbers, especially when dealing with mixed states as in Sec. IV.
In σ 1 the first n + m spins are coupled in a symmetric way, while in σ 2 the symmetrized spins are the last n + m, thus j AB = (n + m)/2 = j BC . The states are diagonal in the angular momentum bases discussed previously, and we have where the lower limit of the first summation takes the value 0 (1/2) for m even (odd). Notice that the spectrum of both matrices is identical and that the basis elements of their support differ only in the way the three spins are coupled. Further, the key feature of the total angular momentum bases is the orthogonality relation Bases of this type are known as Jordan bases of subspaces [14]. Since a state of the first basis (labeled by j AB ) has overlap with only one state of the second basis (labeled by j BC ), the problem is reduced to a discrimination instance between pairs of pure states. Then the total error probability is simply the sum of the contributions of each pair.
In the unambiguous approach, the minimum probability of an inconclusive result for a pair of states |φ 1 , |φ 2 with equal priors is simply P UA (|φ 1 , |φ 2 ) = | φ 1 |φ 2 | [1], hence These overlaps can be computed in terms of the Wigner 6j symbols [25] j AB ; J M|j BC ; J M and they are independent of M [25], therefore in what follows we omit writing the quantum number M, and we perform the sum over M in Eq. (5) trivially by adding the multiplicative factor 2J + 1. Substituting the value of the 6j symbols for j A = j C = n/2, j B = m/2, j AB = j BC = (n + m)/2, and setting J = m/2 + k we obtain with k = 0,1, . . . ,n (observe that J takes values from J = n + m/2 of the totally symmetric space down to J = m/2). Plugging the overlaps in Eq. (7) into Eq. (5), we obtain where we notice that the dimension of the subspace of the total angular momentum J is m + 2k + 1 and in the second equality we have used the binomial sums In the minimum error approach no inconclusive results are allowed, but the machine is permitted to give wrong answers with some probability that one tries to minimize. This minimum error probability can be computed along the same lines as in the previous case. Recall that the error probability P ME for two pure states |φ 1 , |φ 2 , and equal a priori probabilities is [2] P ME (|φ 1 , The total error probability is just the sum of the contribution of each pair of states with the same quantum numbers J M, {|j AB ; J M , |j BC ; J M }, It is instructive to obtain the well-known results when the ports are loaded with just one copy of each state [9] (i.e., n = m = 1).

The inconclusive probability in the unambiguous approach reads
five out of six times the machine gives an inconclusive result and only 1/6 of the times identifies the state without error. Notice that the overlaps for J = 3/2 are one. This must be so since J = 3/2 corresponds to the totally symmetric subspace, which is independent of the way the spins coupled. That is, this subspace is identical for σ 1 and σ 2 . This is the main source of error as it contributes 4/6 = 4/6 × 1 out of the total 5/6 error probability. The remaining 1/6 = 2/6 × 1/2 is the contribution of the J = 1/2 subspace, where 2/6 is the probability of having an outcome on this subspace and 1/2 is the overlap between the states [cf. Eq. (7)].
The minimum error probability in the one copy case reads which by using Eq. (7) or directly Eq. (11) gives That is, approximately 1/3 of the times the outcome of the machine will be incorrect. The error probability in both the minimum error and unambiguous approaches, will, of course, decrease when using more copies of the states at the ports of the discrimination machine. Equations (8) and (11) give the unambiguous and minimum error probability for arbitrary values of n and m. They enable us to study the behavior of the machine for a large number of copies in the program and the data ports, which is what we next discuss.

III. ASYMPTOTIC LIMITS FOR PURE STATES
Let us start by considering the case of an asymptotically large number of copies at the program ports (n → ∞) while keeping finite the number of copies m at the data port. For the unambiguous discrimination one obtains from Eq. (8) We wish to show that in this limit the programmable machine has a performance that is equivalent to a protocol consisting in first estimating the states and then doing a discrimination of known states. The average of the inconclusive probability of this protocol over all input states should coincide with Eq. (15).
Recall that for known |ψ 1 and |ψ 2 states, when a number m of copies of the unknown state is given, this probability reads 042312-3 One can do an explicit calculation of the average P UA (ψ 1 ,ψ 2 ) = 1/2 π 0 sin θ cos m θ/2, but it is amusing to obtain it in a very simple way from the Schur lemma where d m/2 is the dimension of the symmetric space of m/2 qubits (notice that stricto sensu this procedure is only valid for m even). Plugging this average into Eq. (16) one immediately recovers Eq. (15). Now we turn our attention to the minimum error probability. Taking n → ∞ and using the Stirling approximation z! ≈ z z e −z √ 2πz in Eq. (11), one obtains where we have defined x = k/n and used the Euler-McLaurin summation formula at leading order . This result could be easily anticipated from the minimum error probability with classical knowledge of the pure states. Recall that the minimum error probability given m identical copies is and we just have to compute the average for all pairs of the above expression. Using | ψ 1 |ψ 2 | 2 = cos 2 θ/2, where θ is the relative angle between the Bloch vectors of the two states, one has and performing the change of variables x = sin θ/2 this equation is cast exactly in the form of Eq. (18). What cannot be anticipated is the next order O(1/n), which gives very relevant information on how fast the protocol reaches the asymptotic value (18). A lengthy, but rather straightforward, calculation yields the remarkable result that this term has a coefficient which coincides with the value of the integral At this order we therefore can write We now analyze the complementary case, that is, when the number of copies at the data port is infinitely large, m → ∞, while the number n of copies at the program ports is kept finite. In this limit we have perfect knowledge of the data state |ψ , but we do not know to which program port it should be associated. Observe that this situation is very much the same as state comparison [19].
In this scenario the inconclusive probability in the unambiguous approach reads from Eq. (8) Let us see that this agrees with the average performance of the standard state comparison. If the data state is the same as the program state in the upper or lower port, the effective states to be discriminated are respectively, where d n = n + 1 is the dimension of the symmetric space of n qubits and 1 n is the projector onto this subspace. The minimal inconclusive probability for these two states can be obtained with a positive operator-valued measure (POVM) whose elements are , that is, with a POVM that checks whether the first state is the state |ψ or not; notice that a POVM checking the second register will work equally well. Thus, we have an unambiguous answer whenever the second outcome is obtained and an inconclusive answer whenever the first outcome occurs, which happens with probability independently of the state |ψ . The minimum error probability in this limit can be tackled in a similar fashion. The asymptotic expression of Eq. (11), though not as direct as in the unambiguous case, is rather straightforward to obtain. Notice that the dominant factor in the term containing factorials inside the square root is m −2(n−k) . So, we can effectively replace the square root term by 1, for all k < n. Taking into account that for k = n the square root vanishes, we have The minimum error probability of a strategy that first estimates perfectly the input state and then tries to associate the correct label to it is given by the Helstrom formula for σ 1 and σ 2 [2] where A = tr √ A † A is the trace norm of operator A. Substituting the expression of the states (23) we obtain where in the first equality we have subtracted the common term [ψ ⊗n ] ⊗ [ψ ⊗n ] from both states, in the second we have used the orthogonality of the operators and in the last equality we take into account that tr[ψ ⊗n ] ⊥ = tr(1 n − [ψ ⊗n ]) = n (i.e., one unit less than the dimension of the corresponding symmetric space) . As expected, the result is again independent of |ψ .
To end this section we compute the asymptotic error probabilities for the symmetric case, that is, when all the ports are loaded with the same m = n (and large) number of copies.
In the unambiguous approach when n = m → ∞ the first nonvanishing order of Eq. (8) reads To compute the minimum error probability, it is convenient to write Eq. (11) for n = m as where and We first observe that c k is a monotonically increasing function and hence it takes its maximum value at k = n. Second, we note that around this point where is the Shannon entropy of a binary random variable and we have used that k ≈ n and H (1/2) = 1. Similarly, one has 2n and hence c k 2 −(n−k) . With this, the probability of error in this limit reads Finally, we perform the change of variables k → n − k and use that in Eq. (30) p n−k 3/(2n) for k 0 to obtain where we have defined the function which converges very quickly to its exact value (the first four terms already give a value that differ in less than 10 −3 from the exact value).

IV. MIXED STATES
We now move to the case when the program and data ports are loaded with mixed states. This situation arises for instance when there are imperfections in the preparation or noise in the transmission of the states. It is reasonable to suppose that these imperfections have the same effect on all states (i.e., to consider that the states all have the same purity r). The input states are then tensor products of where n i is a unitary vector and σ = (σ x ,σ y ,σ z ) are the usual Pauli matrices. In what follows we assume that only the purity is known (i.e., one knows the characteristics of the noise affecting the states, but nothing else). This means that the averages will be performed over the isotropic Haar measure of the S 2 sphere, in the same manner as for the pure states. At the end of this section we also analyze the performance of a fully universal discrimination machine, that is, when not even the purity is considered to be known. Notice that mixed states can only be unambiguously discriminated if they have different supports [21], which is not the case when the ports are loaded with copies of the states (37) as they are full-rank matrices. Therefore, only the minimum error discrimination approach will be analyzed here. It is worth stressing that the computation of the optimal error probability in the multicopy case is very nontrivial, even for known qubit mixed states. Only recently have feasible methods for computing the minimum error probability for a rather large number of copies been developed and the asymptotic expression of this probability obtained [4,6]. The main difficulty can be traced back to the computation of the trace norm [see Eq. (26)] of large matrices. The dimension of the matrices grows exponentially with the total number of copies entering the machine, and for a relative small number of them the problem becomes unmanageable. However, as it will be clear, it is possible to exploit the permutation symmetry of the input states to write them in block-diagonal form [22,23], crucially reducing the complexity of the problem.
The two effective states we have to discriminate are where dn i = d i /(4π ) is the invariant measure on the twosphere. Any state having permutation invariance (as, e.g., ρ ⊗n ) can be written in a block-diagonal form using the irreducible representations of the symmetric group S n . Each block is specified by the total angular momentum j and a label α that distinguishes the different equivalent representations for a given j 042312-5 The angular momentum takes the values j = n/2, n/2 − 1, . . . ,1/2(0) for odd (even) n and the number of equivalent representations for each j is [23] ν n j = n n/2 − j That is α = 1, . . . ,ν n j . For each block we have [23] trρ jα = 1 − r 2 4 which, of course, is the same for all equivalent irreducible representations (i.e., independent on the label α). We sketch here the origin of the factors appearing in Eq. (41) (full details can be found in [23]). The first factor comes from the contribution from the n/2 − j singlets present in a representation j made up of n spin-1/2 states. The summation term is the trace of the projection of the remaining states in the symmetric subspace with total angular momentum j , where we can use the rotational invariance of the trace to write each state in the diagonal form ( ). This term simply reads and hence Very much in the same way as it happened in previous sections, the only difference between the diagonal basis of σ 1 and σ 2 is the ordering of the angular momenta couplings. In σ 1 we first couple subspaces A and B and obtain where is the projector onto the subspace with quantum numbers ξ AB = {j A ,α A ,j B ,α B ,j AB } and C n+m j AB is defined in Eq. (41). Notice that C n+m j AB depends only on the purity of the state and on the total angular momentum j AB . Notice also that the tensor product of a mixed state has projections in all subspaces and the blocks are not uniquely determined by the value of j AB (i.e., one has to keep track of the labels j A and j B as well). Of course, subspaces with different quantum numbers ξ AB are orthogonal (i.e., tr[1 ξ 1 ξ ] = δ ξξ tr1 ξ ). When coupling the third system one plainly adds the quantum numbers ξ C = {j C ,α C }.
In the notation we have developed so far, the diagonal bases of σ 1 and σ 2 are written as B 1 = {|ξ AB ξ C ; J M } and B 2 = {|ξ A ξ BC ; J M }, respectively. Obviously, each set contains 2 2n+m orthonormal states and Eq. (38) reads We just have to compute the minimum error from the Helstrom formula (26) for these two states. It is convenient to define the trace-norm term so that To compute T we need to know the unitary matrix that transforms B 2 into B 1 or vice versa. The elements of this unitary are given by the overlaps between the elements of both bases ξ AB ξ C ; J M|ξ A ξ BC ; J M . We observe that these overlaps are nonvanishing only if j X = j X , α X = α X (X = A,B,C) and J = J ,M = M . Furthermore, as mentioned previously, their value does not depend on M or α X , thus, sums over these quantum numbers simply amount to introduce the corresponding multiplicative factors. Therefore, it is useful to introduce a label containing the quantum numbers that determine the orthogonal blocks in B 1 and B 2 that may have nonvanishing overlaps, ξ = {j A ,j B ,j C ,J }, and the corresponding multiplicative factor where ν n j is given in Eq. (40). Equation (47) then reads where the explicit expressions of the matrix elements are Recall that the overlap (52) is independent of the quantum number labeling the equivalent representations (recall also that it is independent of M) and therefore is given by Eq. (6). The computation of the minimum error probability reduces to a sum of trace norms of small-size Helstrom matrices that have dimensions of the allowed values of j AB and j BC for given ξ = {j A ,j B ,j C ,J }. Hence and this computation can be done very efficiently. We would like to show the analytical results for the simplest case of having just one state at each port (i.e., when n = m = 1). In this situation we have fixed values j A = j B = j C = 1/2, the total angular momentum can be J = 3/2,1/2, and j AB = 1,0 (and similarly for j BC ). Here there is no degeneracy, the number of equivalent representations defined in Eq. (40) is 1, and therefore the multiplicative factor (49) simply reads γ ξ = 2J + 1. The only relevant quantum number in this case is ξ = J , as all the others are fixed, and we do not need to write them explicitly. The minimum error probability is then The term of the sum corresponding to J = 3/2 vanishes since it corresponds to the projection of σ 1,2 onto the completely symmetric subspace, which is identical for both states. Indeed, in this subspace σ and Plugging these expressions into Eq. (54) we obtain the minimum error probability of the one-copy state As expected, when r → 1 we recover the pure state value (14). Numerical results of the minimum error probability as a function of the purity of the input states for the symmetric case n = m are depicted in Fig. 1. One sees that for low values of n (n 3) the dependence on the purity is not very marked, the curves are concave almost in the whole range of the purity. For larger n, however, there is an interval of purities where the behavior changes quite significantly. For instance, for n = 29, the inflection point occurs at r ≈ 0.3. At very large values of n one expects a step-like shape with an inflection point approaching r = 0 because the probability of error remains very small for r = 0 and is strictly 1/2 at r = 0. The shape of the curves is explained by the existence of two distinct regimes. For high purities the probability of error is well fitted by a linear function in the inverse of the number of copies. We get P ME 0.88/(nr 2 ) where the value 0.88 coincides with the analytical value computed for the pure states in Eq. (35). Of course, this approximation cannot be valid for low purities. In this range of low purity the minimum error probability is very well approximated by the Gaussian function P ME 1/2 exp[−nr 2 /(2 √ 3)], where we have taken the argument of the exponential from the exponentiation of the exact 1 × 1 × 1 case (57). This approximation works for purities in the interval of the width of the Gaussian (i.e., up to ∼ 1/ √ n). Therefore, as n increases the asymptotic approximation P ME ∝ 1/(nr 2 ) extends its validity to almost the whole range of purities, and the expected jump discontinuity develops in r = 0 as n → ∞. Similar information is depicted in Fig. 2, where the error probability is plotted as function of the number of copies n for different purities. We have superimposed the asymptotic result, which is seen to yield a very good approximation to the exact error probability already for n 20.
V. ASYMPTOTIC n × 1 × n As in previous sections, it is interesting to study the performance of the machine in the asymptotic regimes. A particularly important instance where it is possible to obtain closed expressions is the case when the number of copies at the program ports is asymptotically large and there is one state at the data port. We show how to compute the leading order and sketch the generalizations needed to obtain the subleading term.

042312-7
Observe first that j AB can only take the values j AB = j A ± 1/2 and similarly for j BC . Therefore σ (ξ ) 1,2 are 2 × 2 matrices (except in the extremal case of J = j A + j C + 1/2 which is one dimensional). It is useful to write with With this definition one simply has [see Eq. (51)] We further notice that for large n ν n j C n j ≈ 1 n/2 + j + 1 1 + r 2r Defining y = 2j/n and using the Euler-Maclaurin summation formula, we have for a generic function f (j ) where we have extended the limits of integration from (0,1) to (−∞,∞), which is legitimate for large n, and defined a Gaussian distribution centered at y = r with variance σ 2 = (1 − r 2 )/n. Notice that at leading order n → ∞, G ∞ ≈ δ(y − r), and hence Notice also that at this order There only remains to compute the unitary matrix Eq. (52). Observe that the total angular momentum takes values J = |j A − j C | + 1/2 + k with k = 0,1, . . . ,2 min{j A ,j C }. The leading order is rather easy to write (the subleading term, although straightforward, is far more involved and we will not show it here). At this order we have J = 1/2 + k and k = 0,1, . . . ,nr and the matrix elements computed from Eq. (6) yield Plugging Eqs. (58)-(66) into Eq. (50) one gets T nr k=0 2k 2 n 3 r 2 (nr) 2 where the sum over j A and j C has been trivially performed by substituting their central value nr/2 in the summand and the only remaining multiplicative of γ ξ [cf. Eq. (49)] is 2J + 1 2k. Finally, defining x ≡ k/nr and using the Euler-Maclaurin approximation as in Eq. (18) we obtain T 4r and hence which obviously coincides with the pure state result Eq. (18) for m = 1 and r → 1.
As for the computation of the next-to-leading order, the integrals approximating the sums over j A and j C have to incorporate the fluctuations around the central value, that is, one defines j A = n 2 (r + η A ) and j C = n 2 (r + η C ), where the variables η X have effective dimension n −1/2 . Then one can expand the matrix elements of σ 1,2 , , and the terms of ν n j present in Eq. (62), taking into account the effective dimensionality of all the terms [notice that k → n(r + η)x, where the integration range of x is (0,1)]. One then performs the sum in k by means of the Euler-Maclaurin summation formula as before. Finally, one computes the integration in j A/B taking into account that the range of the variables η A/B can be taken to be (−∞,∞). After a somewhat lengthy calculation we obtain .
Notice that the limit r = 0 is singular and not surprisingly the expansion breaks down for purities of order 1/n. As it should, the error probability (70) increases monotonically with the purity. In Fig. 3 we plot the error probability as a function of the purity for n = 20 and n = 79. One sees that the asymptotic expression (70) approximates very well the minimum error probability even for a small number of copies. For larger n (e.g., for n = 79) the approximation works extremely well down to values below r = 0.3.
We finish this section by showing that the leading term (69) coincides with the average error of a device that first estimates the mixed states at the program ports and afterward does the usual minimum error discrimination of the data state. From the Helstrom formula (26) particularized for mixed qubit states one has where the average is taken over all possible orientations of the Bloch vectors r 1 and r 2 . For equal purity states it simply reads

VI. FULLY UNIVERSAL DISCRIMINATION MACHINE
Let us finally address the fully universal discrimination machine (i.e., a machine that distinguishes states from which nothing is assumed to be known, not even its purity). For this type of machine, we need to specify a prior distribution for the purity. While the isotropy of the angular variables yields a unique uniform distribution for the angular variables, the Haar measure on the two-sphere used in previous sections, the corresponding expression for a fully unbiased distribution of the purity w(r) is not uniquely determined. This is a longstanding issue, and several priors have been suggested depending on the assumptions made [7,24]. Here we will not stick to a particular distribution, rather we will show results for three reasonable distributions. The actual values of the the probability of error may depend on the chosen prior, but the overall performance is seen to be very similar.
The most straightforward, but perhaps not very well grounded, choice is that of the distribution of a hard sphere w(r) ∝ r 2 , that is, a normalized integration measure given by The Bures distribution is far better motivated. It corresponds to the volume element induced by the fidelity distance [26]. It is monotonically decreasing under coarse graining [24] and it has been argued that it corresponds to the maximal randomness of the signal states [27]. In this case one has w(r) ∝ r 2 / √ 1 − r 2 . Notice that this distribution assigns larger weights to pure states, as their distinguishability in terms of the fidelity is larger than that of mixed states. The integration measure reads Last, we also consider the recently proposed Chernoff distribution [4]. It is the prior induced by the Chernoff distance which has a clear operational meaning in terms of the distinguishability between states. By construction it is monotonically decreasing under coarse graining. This measure assigns even larger weights to states of high purity and lower to the very mixed ones. This assignment is, again, based on the distinguishability properties, but in terms of the asymptotic behavior of the error probability. The measure can be written as [4] The effective states we have to discriminate are where dρ k takes the expressions of the measures (73) through (75). Note that the block structure of the states is the same as before, as it only depends on the permutation invariance of the input states, which remains untouched. Further, we can use rotational invariance in the same fashion as in Eqs. (44) and (46). Therefore, here it is only required to compute the average of the coefficients C n j in Eq. (41) according to priors (73) through (75). To calculate the minimum error probability of this fully universal machine one simply uses Eq. (53) for the states (46) with the averaged coefficients C n j computed in Appendix B.
In Fig. 4 we present the minimum error probability of the fully universal machine for the three priors discussed for an equal number of program and data states up to n = m = 26. As anticipated, the smaller average error corresponds to the Chernoff distance because states with higher purity are assigned a larger weight, and these are easier to discriminate. The probability of error, as somehow expected, is inversely proportional to the number of copies, and attains very similar values than for the discrimination of states with fixed known purity of the order of r ∼ 0.9.

VII. CONCLUSION
We have studied the problem of programmable discrimination of two unknown general qubit states when multiple copies of the states are provided. For pure states we have obtained the optimal unambiguous discrimination and minimum-error probabilities, Eqs. (8) and (11), respectively. Some results along these lines can be found in [17], however, no closed expressions were given there. Knowing the error in the asymptotic regimes is very relevant information as it allows us to assess and compare the performance of devices in a way which is independent of the number of copies. We have obtained analytical expressions for the leading and subleading terms in several cases of interest. As can be anticipated, when the number of copies at the program ports is asymptotically large at leading order we recover the average of the usual discrimination problem of known states Eqs. (17) and (20). When the data port is loaded with an asymptotically large number of copies, we recover the state comparison averaged error Eqs. (24) and (27). These cases correspond to the measure and discriminate protocols where the measurement unveils the classical information about the states.
We have also addressed the programmable discrimination of copies of mixed states. We have obtained the minimum-error probability when the ports are loaded with copies of qubits of known purity Eq. (53). We have assumed that all states have the same purity. This would correspond to a scenario where all the initially pure data and program states are subject to the same depolarizing noise before entering the machine. Closed analytical results for a small number of copies can be obtained and efficiently computable expressions for a fairly large number of copies are given. The asymptotic analytical results show very good agreement with the numerics. The latter show a characteristic 1/N dependence with the number N of available copies-in contrast to the usual exponential decay found in standard (nonuniversal) state discrimination-and provide a very good approximation already for a relatively low number of copies when the states have high purity. For very mixed states the error probability has a drastically different behavior. Logically, in both cases the error probability monotonically decreases with increasing purity r, but in the low-purity regime the dependence is much less pronounced. The range of purities exhibiting this behavior shrinks as the number of copies increases, and the characteristic 1/N behavior of the asymptotic regime extends its validity over almost the whole range of purities.
Finally, we have studied the fully universal discrimination machine, a device that takes in states of which nothing is known (i.e., not even its purity). We compute the minimum error probability for three reasonable prior distributions of the purity: the hard sphere, Bures, and Chernoff (see Fig. 4). The latter is seen to give the lowest error probability. This comes as no surprise since the Chernoff distribution assigns larger weights to pure states (because they are better distinguished). Our results also indicate that the fully universal discrimination states yield an error probability comparable to the discrimination of states of known purity, being that remarkably large (r ∼ 0.9).
In this Appendix we present the unambiguous discrimination and minimum error probabilities when the number of copies n A ,n B ,n C loaded at the machine ports is completely arbitrary. Note that in this case the global states σ 1 and σ 2 [cf. Eq. (2)] may have different dimensions, for d 1 = (n A + n B + 1)(n C + 1) is, in general, not equal to d 2 = (n A + 1) (n B + n C + 1). One can easily convince oneself that the support of the state with the smallest dimension is always contained in the support of the other, and hence the problem can be solved in very much the same way as in the main text simply by taking into account that the error probabilities now only contain contributions from the intersection of the supports. Without loss of generality we can assume from now on that n A n C . As discussed in the main text, the error probabilities are computed by adding the pairwise contributions of the state bases in the common support, the main difference being that σ 1 and σ 2 do not have equal coefficients in front of the projectors and that the overlaps in Eq. (6) will have a slightly more complicated expression. Here we have j A = n A /2, j B = n B /2, j C = n C /2, j AB = (n A + n B )/2, and j BC = (n B + n C )/2.
Using Eq. (6) we obtain that the probability of an inconclusive result in the unambiguous approach is Note that when n A = n C the square root term simplifies and we recover the closed form given in the main text [cf. Eq. (8)]. The minimum error probability can be computed entirely along the same lines, (n A + n B − n C + 2k + 1) where the binomial factors inside the square root are the squared overlaps given in Eq. (6).