Programmable discrimination with an error margin

The problem of optimally discriminating between two completely unknown qubit states is generalized by allowing an error margin. It is visualized as a device---the programmable discriminator---with one data and two program ports, each fed with a number of identically prepared qubits---the data and the programs. The device aims at correctly identifying the data state with one of the two program states. This scheme has the unambiguous and the minimum-error schemes as extremal cases, when the error margin is set to zero or it is sufficiently large, respectively. Analytical results are given in the two situations where the margin is imposed on the average error probability---weak condition---or it is imposed separately on the two probabilities of assigning the state of the data to the wrong program---strong condition. It is a general feature of our scheme that the success probability rises sharply as soon as a small error margin is allowed, thus providing a significant gain over the unambiguous scheme while still having high confidence results.


I. INTRODUCTION
Quantum state discrimination is one of the most basic yet fundamental tasks in quantum information [1]. In its simplest form, it consists in a protocol that tells in which out of two given states a quantum system was prepared. This is a primitive of great practical interest that has been investigated from many perspectives and for which many key results have been obtained. Theoretical results also abound in the literature, e.g., state discrimination provides an operational distance between any two states [2] based on the degree of difficulty of telling one from the other. It has also been shown that for multiple copies of pure states there exist individual adaptive measurements on each copy that provide exactly the same discrimination power as the optimal (global) measurement strategy [3]. This is however not so for mixed states, and there is numerical evidence that even the corresponding asymptotic exponential error rates are different in this case [4,5].
Generically, a discrimination protocol, to which we will refer throughout the paper as device, machine or more explicitly as discriminator, is not universal but specifically designed for each given pair of possible states. A significant conceptual twist on discrimination was introduced in [6,7], where devices that work for arbitrary pairs of states were considered. These machines have two program ports through which multiple copies of the unknown quantum states are loaded ("the programs," for short). Multiple copies of a third state (guaranteed to coincide with one of the states loaded through the program ports) are fed into the data port of the machine. This so-called programmable discriminator is designed to report whether the state of the data is that of the first program, or whether it is that of the second program. The discrimination protocol exploits the difference between the permutation symmetry of the global state of the three ports in the two alternatives. These machines work for discrete [7,8] as well as for continuous variable systems [9]. Programmable discriminators can be regarded as machine-learning devices. It has recently been shown that, in some settings, optimal performance can be attained with a suitable measurement on the two programs followed by a measurement on the data, where only classical communication between the two separate measurements is required. Not only does this mean an important saving of resources, as conventional memory suffices to store the (classical) output of the first measurement, but also that programmable discriminators can be reused and still exhibit optimal performance without having to reload the program ports [10]. Interestingly, programmable discrimination is also formally equivalent to a change-point problem [11]. Let us assume that a source produces states of an unknown type and that either at time t 1 or at time t 2 the same source starts producing states of a different type. The change-point problem consists in identifying whether the time at which the change occurs is t 1 or t 2 .
In most of the literature so far either the minimumerror or the unambiguous discrimination scheme is considered. In the former, the discriminator always produces a conclusive answer about the identity of the input state, but sometimes this answer is wrong. In the unambiguous scheme no error is allowed, that is, the input state must be correctly identified with certainty. This can only happen at the expense of producing some inconclusive answers or, in other words, the machine sometimes must abstain [12] from giving an answer. In both cases optimality means that the machine attains maximum success probability. It is clear that, if we relax the unambiguous scheme by tolerating some error rate, we can increase the success probability. Likewise, by allowing some rate of inconclusive answers in the minimum-error scheme, we can also increase the reliability of the answers. Hence by introducing an error margin we can unify minimum-error and unambiguous discrimination. Both become extremal points of the unified discrimination with error margin scheme [13][14][15]. Interpolating between these two extemal cases may have practical interest in some situations.
In this paper we combine the two concepts above and analyze the optimal performance of a qubit multiple-copy programmable machine when an error rate is allowed. We will show that by relaxing the zero error condition slightly the resulting scheme provides an important enhancement in performance over the widely used unambiguous scheme. We will first review the standard problem, when the states between which we wish to discriminate are known. For the sake of self-containedness, we will rederive the success probability for a given error margin in both the so-called weak and strong senses. We will then present our results for programmable devices and obtain the analytical expression of the success probability as a function of the error margins. We will discuss our results in a separate section and will end the paper by stating our conclusions.

II. DISCRIMINATION WITH ERROR MARGINS
Consider two pure nonorthogonal states ρ 1 = |ψ 1 ψ 1 |, ρ 2 = |ψ 2 ψ 2 | as hypotheses of a standard two-state discrimination problem, where for simplicity we assign equal prior probabilities to each state. The discrimination with an error margin protocol can be thought of as a generalized measurement on the system, described mathematically by a positive operator-valued measure (POVM) with three elements E = {E 1 , E 2 , E 0 }, where the operator E 1 (E 2 ) is associated to the statement "the measured state is ρ 1 (ρ 2 )," whereas E 0 is associated to the inconclusive answer or abstention. The overall success, error, and inconclusive probabilities are P s = 1 2 [tr (E 1 ρ 1 ) + tr (E 2 ρ 2 )], P e = 1 2 [tr (E 2 ρ 1 ) + tr (E 1 ρ 2 )], and Q = 1 2 [tr (E 0 ρ 1 ) + tr (E 0 ρ 2 )], respectively. The relation P s + P e + Q = 1 is guaranteed by the POVM condition E 0 + E 1 + E 2 = 1 1. The optimal discrimination with an error margin protocol is obtained by maximizing the success probability P s over any possible POVM E that satisfies that certain errors occur with a probability not exceeding the given margin. Generically, these conditions imply a nonvanishing value of the inconclusive probability Q.
In this paper, we consider two error margin conditions: weak and strong. The weak condition states that the average error probability cannot exceed a margin, i.e., The strong condition imposes a margin on the probabil- ities of misidentifying each possible state, i.e., where p(ρ 2 |E 1 ) and p(ρ 1 |E 2 ) are the probabilities that the state identified as ρ 1 is actually ρ 2 and the other way around, respectively. The strong condition is obviously more restrictive, as it sets a margin on both types of errors separately. However, as we will see, the two conditions are directly related: the strong one just corresponds to the weak one with a tighter error margin [14]. Note that both error margin schemes have the unambiguous (when r = 0) and the minimum-error schemes (when r is large enough) as extremal cases. We will denote by r c the critical margin above which the success probability does not increase and thus coincides with that of (the unrestricted) minimum-error discrimination. For the weak condition, it is straightforward to obtain the maximum success probability by taking into account that the corresponding error probability must saturate the margin condition (1) for r ≤ r c , namely, P e = r. Furthermore, the symmetry of the problem dictates that tr (E 1 ρ 1 ) = tr (E 2 ρ 2 ) = P s and tr (E 1 ρ 2 ) = tr (E 2 ρ 1 ) = P e . Without loss of generality (see Fig. 1), we can write the input states as where 0 ≤ θ ≤ π/2, and the POVM elements as E i = µ |ϕ i ϕ i | for i = 1, 2, with The POVM condition implies E 0 = 1 1 − E 1 − E 2 , and the optimal value of µ is fixed by the extremal value of the inequality E 0 ≥ 0. One obtains µ = 1/(1 − cos φ) ≤ 1 and finally the symmetry conditions fix φ to be where c = | ψ 1 |ψ 2 | = cos θ is the overlap of the states |ψ 1 and |ψ 2 . Notice that in the unambiguous limit, r = 0, the POVM elements E 1 and E 2 are orthogonal to the states |ψ 2 and |ψ 1 , respectively. In the other extreme case, when the error margin coincides with, or is larger than, the minimum error, r ≥ r c , one has E 0 = 0 (no abstention) and E 1 becomes orthogonal to E 2 , i.e., φ = π/2. In this range the measurement becomes of von Neumann type and the first case in (6) implies Taking into account Eq. (6), the optimal success probability reads This result was derived in [13] and its generalization to arbitrary prior probabilities in [14] (also in [15], by fixing an inconclusive rate Q instead of an error margin). Note that the POVM E is fully determined by the angle φ, which in turn is fully determined by the margin r through Eq. (6). The optimal success probability under the strong condition can be obtained along the same lines of the weak case, but it will prove more convenient to use the connection between both conditions to derive it directly from (8). Let us denote by r S (r W ) the error margin of the strong (weak) condition. From the symmetry of the problem, Eqs. (2) and (3) can be written in the form of a weak condition with a margin r W as Hence, if E is the optimal POVM for a strong margin r S , it is also optimal for the weak margin r W , where P e = r W and P s = P W s (r W ) is given by Eq. (8). In terms of the success probability, the relation between r W and r S reads By solving for r W and substituting into Eq. (8) one derives the success probability for a given r S , which we denote by P S s (r S ). For the function P S s one readily obtains in agreement with [13]. Note that the critical margin is the same for both the weak and the strong conditions, i.e., r W c = r S c = r c . Indeed, beyond the critical point inconclusive results are excluded by optimality (Q = 0 and P s + P e = 1) and thus there is no difference between the two types of conditions. As in the weak case, there is a correspondence between the angle φ and r S ; thus E can also be parametrized in terms of the strong margin: Note that an ambiguity arises for c = 1, as φ = π and then E 1 and E 2 become proportional to one another, independently of the value of r S . Note also that for r S = 0 and r S = r c the values of φ for both, weak and strong conditions, coincide.

III. PROGRAMMABLE DISCRIMINATION
Let us elaborate on the definition of a programmable discriminator given in the Introduction. It is a device capable of identifying the state of a system (a qubit in our case) that is guaranteed to be prepared in one of two possible unknown pure states, say {|ψ 1 , |ψ 2 }. By unknown we mean that we lack all the information about their preparation. Instead, we assume that we are supplied with n copies of each of them, which can be fed into the device through two program ports labeled A and C for |ψ 1 and |ψ 2 , respectively. In addition, a third port B is loaded with n copies of the state to be identified. A programmable discriminator is assumed to be a universal device and it should thus work for any pair of states {|ψ 1 , |ψ 2 }. To make this paper self-contained, in this section we review the state of the art of this discrimination problem. A more general and detailed analysis can be found in [8].
A programmable discriminator is defined by a universal POVM with three elements E = {E 1 , E 2 , E 0 }. The operator E 1 (E 2 ) corresponds to the machine assigning the label 1 (2) to the copies in B, meaning that their state is identical to that of the copies in A (C). Once again, the third operator, E 0 , is associated to an inconclusive result. The optimal E is that which maximizes the averaged probability of success, P s = dψ 1 dψ 2 P s (ψ 1 , ψ 2 ) , where P s (ψ 1 , ψ 2 ) is the success probability for a given pair of states {|ψ 1 , |ψ 2 } and the average is taken over all possible pairs. Since E is state independent, P s can be recast as the success probability of discrimination between the two effective global states (of the three-partite port system ABC) when the state in B is either |ψ 1 or |ψ 2 . These effective states are given by the averages respectively, where the notation [ · ] stands for | · · |. The integrals can be easily computed using the Schur lemma (see [8]) and one obtains and the analogous expression for σ 2 where the labels A and C are exchanged. Here 1 1 X (1 1 XY ) is the projector onto the completely symmetric subspace of H X (H X ⊗ H Y ) and d X = tr 1 1 X (d XY = tr 1 1 XY ) is its dimension. In our case we have d A = d C = n + 1 and d AB = d BC = n + n + 1. The states σ 1 and σ 2 are diagonal in the angular momentum basis {|j m }, but extra labels are needed to specify how the various subsystems A, B, and C are coupled to each other. In particular, we use the basis |(j A j B )j AB j C ;jm to diagonalize σ 1 and |j A (j B j C )j BC ;jm to diagonalize σ 2 , where j A = j C = n/2, j B = n /2 and j AB = j BC = (n + n )/2. The diagonal form of σ 1 is (15) and the analogous form of σ 2 is obtained by coupling j B and j C instead of j A and j B . The key property of the angular momentum basis is that it satisfies the orthogonality relation where the overlaps c j can be obtained from the Wigner 6j symbols [16] [see Eq. (19) below]. Bases obeying an orthogonality relation of the form (16) exist for any two subspaces and are known as Jordan bases [17]. Since a state of the first basis has nonzero overlap with only one element of the second basis, the problem of discriminating σ 1 from σ 2 can be cast as pure state discrimination in each Jordan subspace, which we label by j (note that the overlaps c j do not depend on the magnetic number m). Hence, the optimal POVM can be chosen to be of the form E = j E j , where each E j is itself a POVM acting on the subspace H j of total angular momentum j, and the total success probability is simply the sum of all the contributions. The success probability for both, the unambiguous (P e = 0) and the minimum-error (Q = 0) schemes, are given respectively by [8] P UA s = nn (n + 1)(n + 2) , P ME s = 1 2 + 1 2 n k=0 n + 2k + 1 (n + 1)(n + n + 1) where equal prior probabilities are assumed.

IV. ERROR MARGINS IN PROGRAMMABLE DISCRIMINATION
In this section, we generalize programmable discrimination by allowing an error margin. To ease the notation, rather than labeling the various subspaces H j by their total angular momentum j, we will simply enumerate them by natural numbers, α = 1, 2, . . . , n + 1, and sort them by increasing value of j. Hence j = α + n /2 − 1. With a slight abuse of notation, we will accordingly write H α and enumerate the corresponding POVMs and overlaps as E α and c α , respectively, where one has [8] A direct consequence of the block structure of the averaged states and E is that the overall success probability of a programmable discriminator can be expressed as p α = tr (σ i 1 1 α ) = 2α + n − 1 (n + 1)(n + n + 1) , i = 1, 2 , where P s,α is the success probability of discrimination in the subspace H α and p α is the probability of σ 1 and σ 2 projecting onto that subspace upon performing the measurement {1 1 α }. Likewise, P e and Q can be expressed as a convex combination of the form (20).

A. Weak error margin
Let us start by considering the weak condition. If we denote the error margin by R, the weak condition reads P e ≤ R. According to the previous paragraph, the optimal strategy and the corresponding success probability P s are defined through the maximization problem Recall now that the POVMs E α are independent and each of them is parametrized through Eq. (6) by a margin r = r α which, moreover, satisfies the constraint P e,α ≤ r α . Therefore, Eq. (22) can be cast as where the functions P W s,α are defined as in Eq. (8) with c = c α . In other words, these functions give the success probability of discrimination in the subspaces H α with weak error margins r α . The maximization of the success probability translates into finding the optimal set of weak margins {r α } n+1 α=1 whose average, n+1 α=1 p α r α , equals a (global) margin R.
Let us start by discussing the extreme cases of this scheme. On the unambiguous side, R = 0, the only possible choice is r α = 0 for all values of α, and the success probability is given by (17). At the other end point, if R ≥ R c = n+1 α=1 p α r c,α , where r c,α is the critical margin in the subspace H α , given by (7) with c = c α , we immediately recover the minimum-error result (18). We will refer to R c as the global critical margin.
An explicit expression for P s if 0 < R < R c is most easily derived by starting at the unambiguous end and progressively increasing the margin R. For a very small error margin, the Lagrange multiplier method provides the maximum. It occurs at r α = r This solution is valid only when all (partial) error margins are below their critical values, r (1) α ≤ r c,α . If this inequality holds, the maximum success probability is P s = α p α P W s,α (r (1) α ). The use of the superscript "(1)" will become clear shortly.
If we keep on increasing the global margin R, it will eventually reach a value R = R 1 at which the error margin of the first subspace H 1 is saturated, namely, where r n+1 and r c,1 < r c,2 < . . . < r c,n+1 , according to (24) and (7), respectively. The expression for R 1 can be read off from Eq. (24): For R > R 1 , the optimal value of the margin of subspace H 1 is then frozen at the value r 1 = r c,1 , and the remaining margins are obtained by excluding the fixed contribution of the subspace H 1 , i.e., by computing the maximum on the right-hand side of The location of this maximum, which we denote by {r (2) α } n+1 α=2 , is formally given by (24) with R replaced by R − p 1 r c,1 and the sum in the denominator running from α = 2 to n + 1. In this case, we have Again, this is valid only until R reaches a second saturation point R 2 , i.e., provided R 1 < R < R 2 , and so on. Clearly, the margins r α saturate in an orderly fashion as we increase R.
Iterating the procedure described above, the optimal error margins in the interval R β−1 ≤ R ≤ R β (throughout the paper, Greek indexes run from 1 to n + 1), where R 0 ≡ 0 and R n+1 ≡ R c , are found to be where and The success probability in this interval [analogous to Eq. (27)] is where P sat s,β = β−1 α=1 p α P s,α (r c,α ) is the contribution to the success probability of the subspaces where the error margins are frozen at their critical values. After some algebra, we find that the success probability can be written in a quite compact form as Eqs. (28) to (33) comprise our main result.

B. Strong error margin
The concept of a strong margin for programmable machines requires a more careful formulation than that of a weak margin since, in principle, there are different conditions one can impose on the various probabilities involved. For instance, one could require the strong conditions (2) and (3) for every possible pair of states fed into the machine, that is, for every given This approach is quickly seen to be trivial since the machine, whose performance is independent of the states, is required to satisfy the condition in a worst case scenario, in which |ψ 1 and |ψ 2 are arbitrarily close to each other. For any value of the error margin less than 1/2 the inconclusive probability must then approach unity, i.e., Q → 1. This implies that both P s and P e vanish. A similar argument leads to the trivial solution P s = P e = 1/2 if the margin is larger than or equal to 1/2.
The task performed by a programmable discriminator can be most naturally viewed as state labeling: the machine attaches the label 1 (2) to the data if its state is identified, by a "clicking" of the operator E 1 (E 2 ), to be that of the qubits loaded through program port A (C); i.e., the state of the ports has the pattern . For this task, the relevant error probabilities are p(2|E 1 ) and p(1|E 2 ), namely, the probability of wrongly assigning the labels 1 and 2, respectively. It seems, therefore, more suitable for programmable discrimination to impose the strong margin conditions p(2|E 1 ) ≤ R and p(1|E 2 ) ≤ R. In terms of the average states σ 1 and σ 2 in (13) these conditions are and likewise for p(1|E 2 ). Note that in contrast to the weak case, here the conditional probabilities are nonlinear functions of the POVM elements, and thus the maximization of the success probability under these conditions is a priori more involved. To circumvent this problem, we can use the relation (10), which for programmable discrimination also holds, and reads to express the (global) weak error margin R W in terms of the strong one R S . Then, one simply uses Eqs. (28) to (33) to obtain the maximum success probability. The inversion of Eq. (35) is somewhat lengthy but straightforward. The difficulty arises from the fact that the success probability, Eq. (33), is a piecewise function whose expression depends specifically on how many margins r α have reached their critical value r c,α for a given R S . Thus we need to compute the strong saturation points R S β , analogous to (29), through the relation (35).

V. DISCUSSION OF THE RESULTS
In Fig. 2 we plot the maximum success probabilities for both the weak and the strong conditions as a function of a common (global) margin R, for nine program and two data copies. We also show in Fig. 2   of a numerical optimization with the strong condition (dots), which exhibit perfect agreement with our analytical solution. We observe that by allowing just a 5% error margin, the success probability increases by more than 50%. This is just an example of a general feature of programmable discrimination with an error margin: the success probability increases sharply for small values of the error margin.
A comment about the effect of the subspace H n+1 on the shape of the plots is in order. This subspace contains the completely symmetric states of the whole system ABC and, hence, it is impossible to tell if the state of the data (B) coincides with that of one program (A) or that of the other (C); more succinctly, c n+1 = 1. Therefore, half the number of conclusive answers will be correct and half of them will be wrong, and P W s,n+1 = r n+1 , provided r n+1 ≤ r c,n+1 = 1/2. Increasing the error margin simply allows for an equal increase in the success probability. This is reflected in the linear stretch in the upper curve in Fig. 2, right before the (rightmost) flat plateau. For the strong condition, the same situation arises in the interval R S n ≤ R ≤ R c , but the plot of the success probability is not a straight line due to the nonlinear relation (35) between the weak and the strong margin.
An alternative (though completely equivalent) way to compute the maximum success probability with a strong margin is based on the observation that the POVMs E α are also fully determined by strong margins, r S α , through Eq. (12), with the exception of E n+1 , for which c = c n+1 = 1 [giving rise to an ambiguity, as discussed after Eq. (12)]. In this approach, the success probability becomes a convex combination of P S s,α (r S α ), as in (20), where these functions are given in (11) with c = c α . The optimal set {r S (β) α } can be readily obtained from the weak margins in Eq. (28) using the relation (10). The strategy in the last subspace H n+1 can be easily seen to consist in abstention with a certain probability, and a random choice of the labels 1 and 2 otherwise.
The bar chart in Fig. 3 represents an optimal strategy in terms of the corresponding weak and strong error margins. For this example we have chosen 11 program and two data copies. For illustration purposes, the (global) margin is set to a low value of 0.0055. The wide vertical bars in the background depict the critical margins r c,α . There are 12 of them, displayed in increasing order of α (the first one is not visible because of the small value of r c,1 ). On their left (right) halves, a narrow green (orange) bar depicts the optimal weak (strong) margin r W α (r S α ) (we attach the subscripts W and S through the rest of the paper to avoid confusion). We note that the first five margins (α ≤ 5) have reached their critical value. For α > 5, the weak margins decrease monotonically according to Eq. (28). For the last one, we have r W n+1 = r W 12 = 0, which holds for any value of R, provided R ≤ R n . This must be so, since we recall that the projections of σ 1 and σ 2 onto the subspace with maximum angular momentum are indistinguishable. Clearly, allowing for r W n+1 > 0 while there is still room for the other margins to increase cannot be optimal.
Also noticeable in Fig. 3 is that the set of strong margins that have not reached their critical value r c,α has a flat profile (this does not apply to r S n+1 that is always frozen to its critical value of 1/2). To provide an explanation for this, we write the equality in Eq. (34), which is attained if R ≤ R c , as RP s − (1 − R)P e = 0, using once again the symmetry of the problem. We next write the success and error probabilities as a convex sum over α and use the equality in the strong conditions (2) and (3) for each subspace H α to express P S e,α in terms of P S s,α .
We obtain the strong condition The terms in square brackets can be positive or negative depending on r S α being smaller or larger than R, both of which are possible. So, at face value, this equation cannot explain the flat profile of r S α and more work is needed. Next, we use the Lagrange multiplier method to maximize P s = α p α P S s,α (r S α ) and note that the dependence of P S s,α on α (i.e., the term 1−c α ) factorizes, as can be checked from Eq. (11). Without further calculation, we can anticipate that the optimal margins will be determined by n+1 equations of the form p α (1−c α )f (r S α ) = 0, where f can be a function only of R, the Lagrange multiplier and the number of margins below their critical value. Hence, all the (unfrozen) margins will have the same optimal value. For β = 1 (no frozen margins) we have the simple solution r S,(1) α = R for all α, and the corresponding success probability is for a sufficiently small strong margin R.

VI. CONCLUSIONS
In this paper, we have provided two generalizations of programmable state discrimination that enable control on the rate with which errors inevitably arise because of the very principles of quantum mechanics. In the first, a margin is set on the average error probability of mislabeling the input data states (weak condition). In the second, a more stringent condition is required that, for each label, the probability of it being wrongly assigned is within a given margin (strong condition). Generically, in both cases, the discrimination protocol may result sometimes in an inconclusive outcome (i.e., in being unable to assign a label to the data). We have shown that there is a one-to-one correspondence between these two margins, so that weak and strong conditions turn out to be the same if their margins are related by a simple equation. These generalizations extend the range of applicability of programmable discriminators to scenarios where some rate of errors and some rate of inconclusive outcomes are both affordable; or more specifically, to situations where a trade-off between these two rates is acceptable, which departs from the standard unambiguous (zero error) and minimum-error (zero abstention) discrimination scenarios.
Our results include the analytical expression of the success probability for the optimal programmable device as a function of both weak and strong error margins, as well as the characterization of the POVM that specifies such optimal device. From the analysis of these results, we conclude that small error margins can significantly boost the success probability; i.e., a small departure from the unambiguous scheme can translate into an important increase of the success rate while still having very reliable results (very low error rate). We provide an example of this, where a mere error margin value of 5% adds about 50% to the success probability.
A future extension of this work is, e.g., the asymptotic analysis of programmable discrimination with an error margin, when the data and/or program ports are fed with an asymptotically large number of copies. Also relevant is the analysis of programmable discriminators when the measurement is restricted to those compatible with a machine learning scenario. These devices require only classical memory to store the information about the state of the programs, and use it in a later test stage to fix the measurement on the unknown data. They can be reused an arbitrary number of times without reloading the program ports [10].