Optimal parameter estimation with a fixed rate of abstention

The problems of optimally estimating a phase, a direction, and the orientation of a Cartesian frame (or trihedron) with general pure states are addressed. Special emphasis is put on estimation schemes that allow for inconclusive answers or abstention. It is shown that such schemes enable drastic improvements, up to the extent of attaining the Heisenberg limit in some cases, and the required amount of abstention is quantified. A general mathematical framework to deal with the asymptotic limit of many qubits or large angular momentum is introduced and used to obtain analytical results for all the relevant cases under consideration. Parameter estimation with abstention is also formulated as a semidefinite programming problem, for which very efficient numerical optimization techniques exist.


I. INTRODUCTION
State identification and state estimation are fundamental and highly non-trivial tasks in quantum information. The main difficulty lies in the fact that quantum measurements provide partial information about the state of a quantum system, and only when a large number N of identically prepared copies of such a system is available to an experimentalist can she attempt to accomplish a successful identification or a faithful estimation.
In standard protocols the experimentalist is expected to produce a conclusive answer (maybe not right or accurate enough), based on the outcomes of her measurements, at each run of the experiment. To assess the overall performance of the protocol an average cost function or figure of merit is computed, e.g., the minimum probability of misidentification, or the fidelity, F , between estimate and true state. In this context, many results have been obtained over the last years in a large variety of settings [1][2][3][4][5][6][7][8][9][10][11][12][13][14][15].
A new class of protocols has recently emerged as a viable alternative in situations where the approach discussed above fails to achieve the minimum standard of performance required for some specific task, but some number of inconclusive responses, or abstentions, is affordable. This can be seen as a particular instance of post-selection. Examples of such protocols can be found in state discrimination [16][17][18][19][20][21][22][23], where some fixed rate Q of inconclusive outcomes can raise the probability of success significantly (or, e.g., lower the probability of error even down to zero, as in unambiguous discrimination [16]), and also in state estimation [24], where abstention is shown to reduce the negative impact of noisy detectors [25].
In this paper we consider the natural extension of this approach to quantum parameter estimation with pure states of N qubits (or Rydberg atomic states of total angular momentum N/2). More precisely, we deal with an infinite covariant family of such states, paramatrised by some continuous variables, and we aim to estimate the values of these variables for a given sample state by performing suitable measurements on it. We already pre-sented in [26] the paradigmatic instance of phase estimation. Here, we will provide the details missing in [26] and will also address the problem of estimation of spatial directions, which we assume encoded in a given N -spin or angular momentum state. In particular, we focus on two problems, that of a single direction and that of three mutually orthogonal directions (trihedron or Cartesian frame); the later will be referred to as frame estimation for brevity.
The use of abstention in the context of estimation was previously considered in [27], where the author dealt with qudit pure state estimation from a pair of conjugate qudits, and also with estimation of an equatorial qubit state from N independent and uncorrelated copies of the state (phase estimation). At variance with our approach, no lower limit on the acceptance rate (i.e.,Q ≡ 1 − Q) was imposed. In the phase estimation example he considered, the increase in fidelity was achieved only at the cost of imposing acceptance rates that vanish exponentially as N goes to infinity.
The approach of Ref. [25] to multiple-copy qubit state estimation with non-ideal measurements is another example of parameter estimation with abstention. In this case the covariant family of states is the set {ρ ⊗N (rn)} n∈S 2 , where rn is the Bloch vector of ρ(rn) and the unit vector n parametrizes the family. The parameter space is thus the unit 2-sphere S 2 ; in this example, the purity r takes the same fixed value for the entire family of states. It is shown in [25] that abstention leads to a significant increase in the average fidelity for small samples, but for asymptotically large N the fidelity enhancement is modest, and besides, it requires an exponentially vanishing acceptance rate. To summarize, in the cases previously considered in the literature, abstention has limited impact on parameter estimation with asymptotically large samples unless the experimentalist abstains from producing an estimate most of the time.
In [26] we showed that the situation changes dramatically for phase estimation if one allows for more general covariant families. Here we will show that this is also the case for the two new problems at hand, namely, for single direction and frame estimation with pure states of N qubits. For phase estimation the co-variant family we are referring to is the set of states of the form {|Ψ(θ) = U (θ)|Ψ 0 } θ∈[0,2π) , where U (θ) stands for the unitary transformation U (θ)|j = e iθj |j , |Ψ 0 is a fiducial state, which in the eigenbasis of U (θ) can be written as |Ψ 0 = n j=0 c j |j ∈ (C 2 ) ⊗n , and the number of qubits is N = n. The components c j are given arbitrary coefficients, subject to the normalization condition n j=0 |c j | 2 = 1. For direction estimation we consider instead {|Ψ(n) = U n |Ψ 0 } n∈S 2 , where U n stands for the unitary representation of the rotation that takes z (the unit vector in the z-axis; likewise, x and y stand for the other two unit vectors) into n. The fiducial state is now given by |Ψ 0 = n j=0 c j |j, 0 , which may be thought of as pointing along the z-axis (in the sense that it is invariant under rotations about that axis), N = 2n, and we use the standard notation |j, m for the total angular momentum eigenstates. The choice m = 0 is both for simplicity and also because the optimal state for direction encoding is known to have null total magnetic number [10], however the method can be extended to any m. More general states, i.e., those that are not eigenstates of J z , do not fit into our pure state framework [28], since for the sake of direction estimation the subset For frame estimation, the relevant family of states is {|Ψ(g) = U (g)|Ψ 0 } g∈S 3 , where g stands for the three Euler angles: g = (α, β, γ). They specify the rotation that takes the axes x, y and z into those of the Cartesian frame we wish to estimate, with unit vectors (n 1 , n 2 , n 3 ). It can be shown that optimality requires a fiducial state of the form n j=0 c j ( j m=−j |j, m, α m )/ √ 2j + 1, where N = 2n and the third quantum number in the ket, α m , labels the degeneracy of the representation of angular momentum j. Except for the representation of highest angular momentum, j = n, for each j < n we have (maximally) entangled the magnetic number m with the degeneracy number α m . This entanglement with 'ancillary' degrees of freedom that are invariant under the action of the group is responsible for an important enhancement in the estimation precision [29]. We note in passing that this degeneracy is known to be useless for single direction estimation, and thus we dropped the corresponding label there. Indeed, following the symmetry argument used at the end of the previous paragraph, any entanglement between magnetic number and degeneracy labels would in effect turn into an incoherent sum on subspaces of different m values, which is clearly suboptimal.
From a formal point of view, it will be seen that the optimization of the frame estimation protocol for this family of states is equivalent to that of phases for large N . Thus, we find it more interesting to ignore the degeneracy of the representations and consider instead the family generated by the fiducial state |Ψ 0 = n j=0 c j |j, j . States of this form could be produced if, e.g., a hydrogen atom in a Rydberg state of total angular momentum up to n is used instead of N spins [30]. In this scenario, the optimal encoding state for a Cartesian frame is known to belong to this family, but it does not lead to a Heisenberg scaling precision. Also in this case, the method we will introduce can be applied to more general pure states.
For all these estimation problems, a finite acceptance rateQ suffices to lower the coefficient of the leading order in the asymptotic expansion of the average error in inverse powers of N . If an exponentially vanishing acceptance rate is affordable, the leading order in this expansion becomes 1/N 2 , thus attaining the Heisenberg limit, except for frame estimation with Rydberg states. It will be shown that the effect of abstention can be understood in terms of a probabilistic map from the original family to a better one (closer to optimal), {Ψ(θ)} θ∈[0,2π) (or {Ψ(n)} n∈S 2 , etc.), which fails with probability Q.
Last but not least, here we present a general technique to obtain the asymptotic form of pure state parameter estimation problems, with or without abstention, that is interesting on its own. The main idea is that the components of |Ψ 0 can be viewed as a discretization of some continuous function ϕ(t) on the unit interval [0, 1], and likewise, the problem of maximizing the fidelity over those components can be viewed (see below) as a discretization of a constrained variational problem for ϕ(t). The solution of the latter problem gives the asymptotic expression of the fidelity for the former one. This solution can be worked out analytically for many physically relevant settings. For finite N the estimation can be formulated as a semidefinite programming (SDP) problem, and hence solved numerically with very high efficiency [31].

II. GENERAL FRAMEWORK
The problems of phase, direction and frame estimation described above can be treated in a unified framework by writing U (θ) = U (g), |Ψ(θ) = |Ψ(g) , where g ∈ S 1 ; and U n = U (g), |Ψ(n) = |Ψ(g) , where g ∈ S 2 . Since the magnetic number is fixed to zero (j) for direction (frame) estimation, we also drop this quantum number and write |j, 0 ≡ |j (|j, j ≡ |j ). Then, for the three problems we have a family of states {|Ψ(g) = U (g)|Ψ 0 } g∈S d , where d = 1, 2, 3 for phase, direction, and frame estimation, respectively. As already mentioned above, in direction (frame) estimation the fiducial state |Ψ 0 can be thought of as encoding the unit vector z [the cartesian frame (x, y, z)]. Similarly, in phase estimation, |Ψ 0 can be interpreted as encoding the reference unit vector x (to which we assign a zero phase), and U (g) as a rotation of (Euler) angle α = θ around the z-axis [41]. Hence, in this unified framework, we can define a cost function in terms of the (quadratic) error per axis is: i.e. e 1 (g, g χ ) = |n − n χ | 2 for phase and direction estimation, and the total error e 3 (g, g χ ) = 3 a=1 |n a − n a χ | 2 for frame estimation. In these expressions, the subscript χ specifies that the estimate is based on the outcome χ of a generalized measurement that will be introduced below. These errors are related to the 'relative rotation' U † (g χ )U (g) = U (g −1 χ g) through where we recognize the sum in (2) as the character of U (g −1 χ g) in the j = 1 representation. Note that 0 ≤ e 1 (g, g χ ) ≤ 4, and 0 ≤ e 3 (g, g χ ) ≤ 8 (we can at most get two axes completely wrong since we assume righthanded Cartesian frames). As a figure of merit, the fidelity f (g, g χ ) = (1 + n · n χ )/2 = 1 − e 1 (g, g χ )/4 is most commonly used in phase and direction estimation. One has 0 ≤ f (g, g χ ) ≤ 1, where 1 corresponds to perfect estimation. For frame estimation one can also define a fidelity with the same range of values as f (g, g χ ) = 1 − e 3 (g, g χ )/8. These fidelities are also trivial functions of the relative rotation U (g −1 χ g) in the j = 1 representation through Eqs. (1) and (2).
Estimation with abstention can be reduced to a standard estimation problem (without abstention) by simply introducing the new POVMΠ, with elements given bỹ and the new family of (normalized) states With these two definitions we can write the fidelity as where we emphasize that this expression depends on the choice of Π 0 . This expression also brings forward an interpretation of the role of abstention in this optimization problem that we will use throughout the paper: each initial state |Ψ(g) is transformed into a new |Ψ(g) that encodes the unknown parameter(s) g in a more efficient way. This map improves the estimation precision by effectively increasing the distinguishability between the signal states, therefore it can only be implemented in a probabilistic fashion (it succeeds with probabilityQ). This stochastic map is fully specified by the optimal choice of Π 0 : Although this may seem a difficult optimization problem, a huge simplification arises because of the covariance of the family of states. Already from Eqs. (3) and (4)  Thus, Shur's lemma can be applied to all the cases, which results in Π 0 being proportional to the identity on each irreducible block: Π 0 = j f j |j j| (phase estimation), Π 0 = j f j m |j, m j, m| ≡ j f j 1 1 j (direction and frame estimation). Hence, the maximization in Eq. (8) is over {f j : 0 ≤ f j ≤ 1} n j=0 . Note that the transformed set of states {|Ψ(g) } g∈S d is also a covariant family, just as the original one. The corresponding reference state is wheref j ≡ 1−f j . From Eq. (3), and using Shur's lemma, we find Thus, |Ψ 0 = |ξ is a normalized state, as it should, i.e., j |ξ j | 2 = 1.
Since the transformed states are still covariant, we can choose the POVMΠ to be the well known continuous and covariant POVM for each of the problems at hand [1,10]: where the unnormalized state |Φ d is given by Note that g plays the role of χ, i.e., g specifies the different outcomes of the measurement. Hereafter in this paper, it is assumed that the states have non-negative coefficients c j ≥ 0 (and hence ξ j ≥ 0). This is a valid assumption since any phases present in the coefficients c j (or ξ j ) can be absorbed by the above POVM's. This result makes the calculation of the fidelity F (Π 0 ) straightforward: where in the canonical basis, {|j } n j=0 , M is a real matrix of tridiagonal form with where we recall that the superscripts 1, 2 and 3 refer to phase, direction and frame estimation, respectively. At this point one can easily check our statement in the introduction that frame estimation with the family generated by the fiducial state n j=0 c j ( j m=−j |j, m, α m )/ √ 2j + 1 is formally equivalent to phase estimation for large n. For this family, the diagonal entries of the matrix M are zero with the exception of h 0 = −1/2 and h n = −1/(2n + 2), whereas the off-diagonal ones are a j = 1/2, for 0 ≤ j ≤ n − 1 and a n = 1/(2 √ 2n + 1). Thus, except for four entries, M is the same for phase and frame estimation. For very large n, this finite differences have no effect at leading order and the asymptotic result we will obtain for phases also hold for frames when the degeneracy of the representations is used in the encoding.
Here, we have given the explicit form of M for the particular fiducial states under study . However, it is worth noting that for general states the matrix M will always have a tridiagonal structure and hence the methods that we use readily apply. As shown in [10,35], this structure is a generic feature that stems from the fact that the fidelity f (g, g χ ) is a linear function of 1, m|U (g −1 χ g)|1, m ′ (j = 1 representation). Its appearance in the integrant of (7) enforces selection rules that prevent the presence of other off-diagonal elements in M.
The maximization over {f j } of (12) can be turned into a maximization over the transformed states |ξ , namely: subject to the constraints Then, the maximum fidelity for a given rate of absten- For large enough abstention rates (i.e. large enough values of λ) the constraint (17) has no effect (provided all components c j are different from zero) and ∆ becomes the maximum eigenvalue of the matrix M. In this case, F (Q → 1) = F * is the maximum fidelity that can be achieved by optimizing the components of the fiducial state; these are given by the corresponding eigenvector |ξ * of M. The resulting fiducial state thus generates the optimal signal states |Ψ(g) . From (17) it is straightforward to obtain the critical acceptance ratē That is, for abstention rates such that Q ≥ Q * = 1 −Q * the fidelity attains its absolute maximum value F * (and higher rates cannot improve the estimation quality). In the other extreme, when no abstention is allowed (Q = 0), the solution is determined by the constraints ξ j = c j (no maximization is possible), and ∆ = c|M|c .
For intermediate values of Q ∈ (0, Q * ) the problem becomes more tricky. For moderate values of n one can use standard non-linear optimization packages to solve the above constrained convex optimization problem, Eqs. (15)- (17). This can also be easily cast as a semidefinite programming (SDP) problem. The SDP approach is efficient and, furthermore, provides rigorous bounds on the precision of the solution. One simply linearizes these equations by introducing a SDP (positive operator) variable B to play the role of |ξ ξ|. The SDP form of Eqs. (15) and (17) is then subject to the constraints One can easily prove that the optimal B for this problem must necessarily have rank one: since all the entries of M are non-negative, tr(MB) increases with increasing values of the off-diagonal entries B i,i+1 . Their maximum value consistent with positivity is given by rank one matrices. Therefore, the optimal B is of the form |ξ ξ| and the SDP solution provides in turn a solution of Eqs. (15)- (17).
However, as advertised earlier, the main focus of this work is on the regime of asymptotically large n and, in particular, on presenting an approach that enables obtaining analytical expressions in this regime, thus complementing the SDP analysis. We will first introduce and discuss in some detail the approach for phase estimation. The generalization to direction and frame estimation will be discussed afterwards.

III. ASYMPTOTIC REGIME: PHASE ESTIMATION
Here we consider the problem of phase estimation, for which ξ|M|ξ can be cast as This expression can be easily rewritten as where the first term (unity) results from using the normalization condition (16). Instead of maximizing this expression, we will equivalently minimize S ≡ 1 − ξ|M|ξ . A slight difficulty arises here because of the inequality constraints in (17). To deal with them we need to use the so called Karush-Kuhn-Tucker (KKT) conditions (see e.g., [32]), which are a generalization of the Lagrange method. We first have to introduce a multiplier for each constraint: b 2 /2; s j , j = 0, . . . , n; much in the same way as the Lagrange method requires. Hence, we will find the local minima of Besides the constraints specified in (17), which are referred to as primal feasibility conditions, we also need to impose the so called dual feasibility conditions, and, finally, known as complementary slackness conditions. Rather than attempting to solve this system of conditions for arbitrary n, which appears to be a difficult task, we will take n to be asymptotically large and reframe the minimization above as a variational problem for a continuous function ϕ(t) in the unit interval [0, 1]. To do so, we proceed as follows: we first note that as n goes to infinity j/n approaches a continuous real variable t. So, we define and assume {ξ j } and {c j } are a discretization of some continuous functions, ϕ(t) and ψ(t) respectively, so that [note in passing that ϕ(t) ≥ 0 and ψ(t) ≥ 0 ]. The normalization condition for {ξ j } and {c j } holds if we impose From (27), and Eq. (23) can be viewed as a discretized version of the functional S[ϕ], defined by where ω is a positive constant (the properly scaled Lagrange multiplier: ω = nb) and σ(t) is a function that interpolates the set of multipliers {s j }, i.e., With this, Eq. (24) becomes σ(t) ≥ 0. Similarly, the primal feasibility conditions in (17) and the slackness condition (25) become Note that by imposing the boundary conditions ϕ(0) = 0 and ϕ(1) = 0, the functional S[ϕ] becomes O(n −2 ). More interestingly, the minimization of S[ϕ] defines a mechanical problem, of which the second line in Eq. (29) is the 'action' and the corresponding integrant the 'Lagrangian': It describes a driven harmonic oscillator with angular frequency ω, whose 'equation of motion' is To solve this problem, we first note that the slackness conditions imply that either ϕ(t) = λψ(t), in which case t is in the so called coincidence set C, or σ(t) = 0. In the second case, t ∈ C c (C c stands for the complement of C), the primal feasibility condition is ϕ(t) < λψ(t), and Eq. (34) becomes homogeneous (the equation of motion of a free harmonic oscillator). It has the familiar solution where A, B and ω are constants to be determined. In the coincidence set C, σ is determined by (34), where we make the substitution ϕ(t) = λψ(t) (recall that ψ is a given function, as the components c j are themselves given). If we restrict ourselves to fiducial states |Ψ 0 whose components c j are such that ψ(t), defined through Eq. (27), is continuous in the whole unit interval, one can show that the solution ϕ(t) and its first derivative must be also continuous there [except in points of C where ψ(t) itself is not differentiable]. Most of the physically relevant cases are of this type; some of them are considered in the examples below. By taking into account the boundary conditions, as well as the continuity of ϕ(t) and its derivative in the boundaries of C, one can determine the arbitrary constants that arise in solving the equation of motion.
Before presenting examples of this approach, we note that the minimum value of S can be expressed in terms of the Lagrange multiplier (function) ω (σ), and the given function ψ, as To prove this, we just have to integrate by parts (29) and use the equation of motion (34) and the boundary conditions ϕ(0) = ϕ(1) = 0. Note that the integral is effectively over the coincidence set C, where the expression for σ(t) is given by: σ = λ(d 2 ψ/dt 2 + ω 2 ψ), as discussed above.
A. Large abstention (λ ≫ 1) For values of the abstention rate very close to one (large λ), and provided c j > 0 for all j, the quantities λc j are also very large and C = ∅. In this case σ ≡ 0 in [0, 1], Eq. (34) becomes homogeneous and we are dealing with a regular Sturm-Liouville eigenvalue problem. The solution is where the boundary conditions ϕ(0) = ϕ(1) = 0 have been taken into account to discard the independent cos ωt solution. Since we must have ϕ(t) ≥ 0 in the whole unit interval, we find that m = 1 (which gives the minimum eigenvalue of d 2 /dt 2 for the given boundary conditions). The constant A is fixed by normalization and takes the value A = √ 2, thus namely ξ j ≃ 2/n sin(πj/n). The minimum value of S is This leads to an asymptotic maximum fidelity of which coincides with the known fidelity results for optimal phase encoding [27,33].
B. |Ψ0 proportional to the POVM seed state |Φ1 The example we consider here is very simple from a computational point of view and yet illustrates that even a tiny rate of abstention can drastically improve the asymptotic fidelity F of parameter estimation. More precisely, we will show that any finite amount of abstention enables changing the shot noise limit scaling N −1 of 1 − F for large N into the Heisenberg limit scaling: N −2 . The elements of the family are equal superposition of all 'Fock' states |j , i.e. c j = 1/ √ n + 1. Despite of having such a large support, in the standard approach, Q = 0 (λ = 1), the phase estimation fidelity these states provide does not exceed the shot noise limit: . This can be exactly computed for any N with ease from (21). Of course it also agrees with the analytic asymptotic results: using Eq. (27) we obtain ϕ(t) = ψ(t) = 1, for t ∈ [0, 1], and the 1/n (= 1/N ) boundary term in the action (36) is dominant.
Let us know address the more interesting case of Q > 0 (λ > 1). Here we can freely impose ϕ(0) = ϕ(1) = 0 and get rid of the shot-noise type term 1/n. In a sufficiently small neighbourhood of t = 0, i.e., for 0 ≤ t < α, where α is likewise small, we have ϕ(t) − λ < 0, and the complementary slackness condition (32) implies σ(t) = 0 there. If α is the maximum value of t less that 1/2 for which this condition holds, it must be a boundary point of the coincidence set C. Then, for t ≥ α the solution is given by the rescaled input state ϕ(t) = λψ(t) = λ. Thus, where the constants α, ω and A are to be determined. Continuity of ϕ(t) and its derivative at t = α yields A sin ωα = λ, Aω cos ωα = 0.
We are left with the following possibilities for ω and A: The positivity condition ϕ(t) ≥ 0 requires m = 0, and normalization, Eq. (28), Note that since α ≤ 1/2 we have Q * = 1/2. Combining these results we obtain Extending the solution to the entire unit interval by applying the obvious symmetry of the problem, namely Note that C = [Q,Q] and σ(t) = ω 2 λ for t ∈ C [σ(t) = 0 for t ∈ C c ]. Therefore, Eq. (36) gives from which For 1/2 < Q ≤ 1 the solution is (38) and the fidelity in (40). Note that even the slightest abstention rate unlocks the encoding power of the phase states and drastically changes the estimation precision from the original N −1 to N −2 .
The above results are illustrated in Fig. 1, where we represent the optimal solution for a 17% abstention rate. Notice how the slackness conditions apply in the different regions: the straight part of ϕ (corresponding to t ∈ C) is just λψ = λ, while the sinusoidal curves in the extremes (corresponding to the unconstrained region C c ) smoothly match the straight line at the boundary. The agreement between the numerical points and the analytic continuum limit is also quite evident.

C. Multiple copies on the equator
Let us now focus on phase estimation with a signal of the form that is, with N = n copies of states lying on the equator of the Bloch sphere. For these the coefficients c j read The maximum fidelity that can be attained with this signal without abstention is well known to be 1 − F = 1/(4N ) = 1/(4n) for large n [27,33]. To compute the effect of abstention we proceed along the lines of the previous section. In the asymptotic limit Eq. (27) leads to where is the Shannon entropy, and we have used Stirling's approximation. Note that log 2 − H(t) is the (binary) relative entropy H(t 1/2) between a Bernoulli distribution with success probability p = t and the flat one (p = 1/2). As in the previous case, the problem is invariant under t → 1 − t, which suggest using the variable τ = t − 1/2, τ ∈ [−1/2, 1/2], instead of t. Hence, the solution must be an even function of τ . In the region |τ | n −1/2 [i.e., around the peak of the distribution (51)], we can use the Gaussian approximation where we slightly abuse notation here and in the rest of the section and use ψ(τ ) to denote ψ(t(τ )). At the tails (|τ | > n −1/2 ), ψ(τ ) falls off with an exponential rate given by H(1/2 + τ 1/2).
Since the solution of the minimization must be an even function of τ , it must have the form The continuity of both ϕ(τ ) and ϕ ′ (τ ) at the boundary of C, i.e., at the point τ = α read: A cos(Ω) = λψ(α), where we have defined Ω ≡ ωα. Combining these equations we obtain The normalization condition (28) turns out to be Eqs. (54) through (58) cannot be solved analytically, but we can find asymptotic solutions by focusing on some specific regimes. The first we will consider arises when the boundary points ± α scale as n −1/2 , so that C stretches to the region around the peak of ψ(τ ). In this case, ϕ(τ ) = λψ(τ ) gives the dominant contribution to S min and, as one intuitively expects, S min ∼ n −1 . The two pieces of ϕ in Eq. (53) can be matched for arbitrary values of λ and the abstention rate can be finite (is not required to scale with n). The second regime arises when α is fixed. In this situation, for sufficiently large n, the coincidence set C lies on the tails of ψ(τ ). Matching the two pieces of ϕ requires that λ scales exponentially with n, which means that the acceptance rateQ must vanish also exponentially. In return, the piece of ϕ in the first line of Eq. (53) has a wide (non vanishing) domain, [−α, α], and S min ∼ n −2 (1 − F ∼ N −2 ), thus attaining the Heisenberg limit. Let us now consider the two regimes in more detail.
where Erf(x) is the error function. Eq. (61) is correct up to exponentially vanishing contributions, which can be neglected here. In deriving this equation we also used that Erf( n/2) → 1 for large n. Substituting Eq. (60) in Eq. (61) we obtain where Erfc is the complementary error function, defined as Erfc(x) = 1 − Erf(x). Finally, with the help of the Gaussian approximation (52), we compute the minimum action from Eq. (36) and obtain Eqs. (59) and (62), along with ω = Ω √ n/a and Q = 1−1/λ 2 , enable writing all variables in terms of the single parameter Ω. By further substituting in Eq. (63) we obtain the curve (Q, S min ) in parametric form: Note that, as announced above, 1 − F goes as 1/N .
In Fig. 2 we plot nS min = 2N (1 − F ) as a function of Q, using Eqs. (64) and (65). The plot shows a strong dependence on Q. Hence, e.g., allowing about 90% of abstention, has the same effect as doubling the number of copies in the standard approach (without abstention). Note also that for Q → 0 we recover the well known result 2N (1 − F ) = 1/2. The profile of the transformed fiducial state |Ψ 0 is shown in Fig. 3, where ϕ(τ ) and λψ(τ ) are plotted as a function of t = j/n for two different values of n (recall that τ = t − 1/2).

1/n 2 regime
Here we assume that α is fixed (does not scale with n). As n goes to infinity, the boundaries of the coincidence set, τ = ± α, lie on the tails of ψ(τ ), where the Gaussian approximation is not valid, and Eq. (51) must be used instead. Eqs. (56) and (57) now become The first equation can be solved for Ω as an asymptotic series in powers of 1/n: which implies ω ≃ π/(2α). To evaluate the integral in Eq. (58), we expand the exponent −n[log 2 − H(1/2 + τ )] around τ = α, so that We note that, although this contribution falls off exponentially exactly as A 2 , it can be neglected in evaluating Eq. (58) since its prefactor is O(n −1/2 ), as compared to that of A 2 , which is O(n 5/2 ). Taking this into account and substituting Ω ≈ π/2 and Eq. (67) into Eq. (58), we have and the critical acceptance rate isQ * = 2 −n (corresponding to α → 1/2). The minimum action can be computed form Eq. (36) using the same approximation as in Eq. (69). We obtain Note that the exponential factor in the second line of this equation is cancelled by λ 2 , given in Eq. (71), and only the product of the pre-factors, of order n −1 , remains. Thus the second line can be safely neglected in the asymptotic limit and we have with an abstention rate given by Eq. (71). The maximum fidelity is attained by the largest value of α = 1/2, for which F = F * as it should be. In summary, high abstention rate (exponentially small acceptance rate) enables a drastic change in the scaling with the number of copies of the estimation precision. With such rates, one can attain 1 − F ∼ 1/N 2 , i.e., achieve the Heisenberg limit.

IV. DIRECTION ESTIMATION
Proceeding along the same lines as in Sec. III, we can write ξ|M|ξ [recall Eq. (12)] as and S = 1 − ξ|M|ξ becomes now where we have used the normalisation constraint in Eq. (16). Introducing Lagrange multipliers according to KKT, and assuming N = 2n asymptotically large, we obtain the equivalent variational problem of minimizing the action S = ϕ 2 (1) 2n (76) where the primal feasibility condition (31) and the slackness condition (32) still apply. For λ = 1 no transformation of the state is possible, therefore the first, order n −1 , term in (76) is fixed by the boundary value of the initial state ψ(1). For λ > 1 we can impose ϕ(1) = 0, hence opening the door to order n −2 scaling (i.e., to attaining the Heisenberg limit). The evolution equation corresponding to the second line in Eq. (76) is more conveniently expressed in terms ofφ(t) = ϕ(t)/ √ t. It reads The minimum value of the action can be written as in Eq. (36), where we recall that σ(t) can be only different from zero in the coincidence set C. Now, σ(t) is given by Eq. (77) withφ(t) = λψ(t)/ √ t.

A. Large abstention (λ ≫ 1)
For abstention rates close to unity, and provided c j > 0 for all j, one has C = ∅, so σ(t) ≡ 0. Eq. (77) becomes homogeneous and its solution is where J 0 and Y 0 are Bessel functions of first and second kind respectively, and A, B and ω are constants that we fix by requiring ϕ(1) = 0 (otherwise S is order 1/n) and the convergence of the integral in Eq. (76). The latter implies B = 0. The former condition and the positivity of ϕ(t) fixes ω to be the first zero of J 0 , which we call γ 1 . Hence, ω = γ 1 ≈ 2.405. Imposing normalization we finally fix A, and the solution is Using Eq. (36), we obtain S * = γ 2 1 /2n 2 , and the maximum fidelity is in agreement with [10]. The abstention rate required to achieve the Heisenberg limit strongly depends on the initial family of states, as will be shown in the following two examples.
B. |Ψ0 proportional to the POVM seed state |Φ2 In analogy with Sec. III B, in this example we choose the fiducial state |Ψ 0 to be proportional to the POVM seed |Φ 2 in Eq. (11). This leads to ψ(t) = √ 2t, and the solution has the form Then, σ(t) = λω 2 √ 2t, if t ∈ C = [0, α] (and it vanishes otherwise). Substituting in Eq. (36), the minimum action can be written as Continuity of ϕ(t) and its first derivative at t = α, imply and the boundary condition ϕ(1) = 0 requires, We will not attempt to find the exact analytical solution of this transcendental equation, but rather, consider two particular regions of α (the boundary of the coincidence set C) where approximate solutions can be easily derived. They are given by α 0 and α 1. That will suffice to capture the main features of S min (see Figure 4). Note that small α corresponds to large λ, since the coincidence set C = [0, α] is a small region and thus ϕ(t) cannot differ much from the unconstrained solution that leads to F * . On the other hand, α 1 must correspond to small abstention.
If α 0, we substitute the ansatz ω = γ 1 +aα+bα 2 +. . . in (84). After some algebra, we obtain where we have made use of the relation in particular, Y 0 (γ 1 ) = 2J −1 1 (γ 1 )/(πγ 1 ). If α 1, Eq. (84) can only hold for very large ω and αω ≈ ω, as is apparent from Eq. (86), and we can replace the Bessel functions for their well known asymptotic approximations With this, Eq. (84) becomes from which We next impose the normalisation condition to find the relationship between λ and α. For α 0, we find Taking the limit α → 0 we find the critical value of λ: λ * = 1/J 1 (γ 1 ); and the critical rate of abstention: Substituting Eq. (85) and (90) in Eq. (82) we readily see that the various contributions to order α 2 cancel, and One can check that, as expected, S min (and thus the fidelity) is flat in the region α 0 (Q Q * ); i.e., S min is a smooth function of Q at Q = Q * . Indeed, Eq. (90) implies The correction can be computed explicitly with some effort. We find that S min increases up to 3.5% for Q ≈ 0.6, at which point the approximation breaks down.
For α 1, we find Combining all these results, we find where we insist that this expression is a very good approximation down to relatively small values of Q, as can be seen in Fig.4. In this figure we plot Eq. (94) for each regime (lines), along with some numerical results (points). The plot shows a very good agreement for most of the values of the abstention rate Q. One can see that the flat region extends to values of Q fairly smaller than Q * . Note again, that any nonzero amount of abstention enables the estimation accuracy to change behaviour from 1/N to 1/N 2 , thus attaining the Heisenberg limit.  (94), whereas the circles are numerical results. In order to approach the asymptotic limit, higher values of n are needed for smaller Q. Accordingly, two different values of n have been used; n = 50 (filled circles) and n = 120 (empty circles).

C. Antiparalel spins
As for the case of phase estimation, here we focus on signals consisting in product states of N = 2n spins. The simplest possibility is, of course, identical copies. However, this case is of no relevance to direction estimation with abstention, since the seed state |Ψ 0 has only a single component in the symmetric subspace of j = n, i.e., c j = 0, if 0 ≤ j < n, and abstention can only change the components by a multiplicative factor, as shown in Eq. (9). Thus ξ j = 0, if 0 ≤ j < n and ξ n = c n . Instead, we consider a seed state consisting of 2n antiparallel spins; n of them pointing along the positive z-axis and the other n pointing along the opposite direction, Such state has zero magnetic number, m = 0 and nonvanishing components c j given by where j, m; j ′ , m ′ |J, M are the standard Clebsch-Gordan coefficients. The 'continuous version' of these components is given by (t = j/n) which has a peak at t = 0. The solution to the minimisation problem in Eq. (76) has the form Following the same lines as in Sec. III C, we consider two scalings of the boundary point t = α: one where it goes to zero as 1/ √ n, and a second one, where α is fixed. These will lead to two regimes, where 1 − F vanishes respectively as N −1 and N −2 .

1/n regime
In this regime we set α = a/ √ n. As in the phase case, we can use the 'Gaussian approximation' for (97): Note that 1 0 ψ 2 (t) dt = 1, up to contributions that vanish exponentially with n. The following expressions follow from the conditions of continuity of the solution and its derivative as well as normalisation: where we have defined Ω ≡ ωα = ωa/ √ n. The minimum action S min is given by where we have neglected exponentially vanishing terms. This expression, together with Eqs. (100) and (102) defines the curve (Q, S min ) in terms of the free parameter Ω ∈ [0, γ 1 ). The corresponding plot is shown in Fig. 5. We see that for moderate values of the abstention rate one can substantially improve the estimation precision. E.g., a rate of abstention of 95% has the same effect as doubling the number of spins in the standard approach (without abstention). Note, however, that with finite acceptance rate we cannot beat the shot noise limit.

1/n 2 regime
Here we take α to be fixed. From Eq. (98) , continuity of ϕ(t) and ϕ ′ (t) at t = α yield where, as before, Ω = αω. It follows that where Eq. (97) has been used. We can solve this equation for Ω as a series in inverse powers of n, obtaining where we recall that γ 1 stands for the first zero of the function J 0 (z). Substituting this result into Eq. (105) we obtain Neglecting the contribution from the coincidence set, by the same arguments as in the paragraph after Eq. (69), the normalization condition is Combining the last two equations, we find As for phase estimation, the acceptance rateQ falls off exponentially. The minimum action S min can be computed from Eq. (36) along the same lines as in the analogous phase estimation example. This leads to As in Sec. III C, abstention enables exceeding the shot noise limit. Note that for α = 1, we have F = F * , Eq. (80), as expected.

V. FRAME ESTIMATION
As anticipated in the introduction, if the encoding system consists of N qubits one can make use of the multiplicities of the different irreducible representations (i.e. the degeneracy of the j quantum number) to provide a very efficient encoding of the orientation of a Cartesian frame, or equivalently, of the rotation group parameters g. States of the form |Ψ 0 = n j=0 c j ( j m=−j |j, m, α m )/ √ 2j + 1 exploit optimally these ancillary degrees of freedom and lead to a matrix M that is (almost) equal to that corresponding to phase estimation. Hence, most of the expressions and conclusions derived in Section III also hold in this case, but one must recall that N = 2n for frames (whereas N = n for phases), i.e. one must perform the change N → N/2 in the formulae of that section to obtain the corresponding formulae for frames. In particular, Eq. (40) becomes F * = 1−π 2 /N 2 for frame estimation, in agreement with [29]; Eq. (73) becomes F = 1 − π 2 /(4N 2 α 2 ) + . . . , and so on. Note in particular that direction estimation does not provide an optimal strategy for frame estimation, namely, the optimal frame fidelity cannot be attained by splitting the N qubits in three groups, encoding each orthogonal direction in one of them, and performing three independent direction estimations.
In our final example we move away from the N -qubit encoding towards a scenario where the degeneracy of the angular momentum representations cannot be used to improve the frame estimation accuracy, as is the case of, e.g., an atom in a Rydberg state. In this scenario, we have In the asymptotic limit, the continuous version of this expression is cast as ξ|M|ξ = 1 − S, with the action which includes the constraints (16) and (17) and the corresponding Lagrange multipliers ω and σ, and where we have set ϕ(0) = ϕ(1) = 0. This action and that for direction estimation, Eq. (76), look much the same but for the term proportional to n. This apparently minor difference leads however to very different asymptotic behaviors. The equation of motion that follows from (114) turns out to be Since n is assumed to be asymptotically large, the term proportional to n in (114) forces ϕ(t) to peak at t ≈ 1 in order to minimize the action. Therefore, the last term in (115) can be safely neglected. The minimum value of S can be written in terms of the Lagrange multipliers and ψ(t) as in Eq. (36), with σ = λ[ψ ′′ + (ω 2 ψ − 2n/t)ψ] for t ∈ C, and σ = 0 otherwise.

A. Large abstention
Once again, for abstention rates close to one, and provided c j = 0 for all j, Eq. (115) becomes homogeneous, i.e., σ = 0, and, along with the boundary conditions ϕ(0) = ϕ(1) = 0, defines an eigenvalue problem. Its solution can be given in terms of Whittaker functions, but unfortunately is rather involved. It proved much simpler to formulate and solve a less demanding eigenvalue problem with the same large n asymptotic behavior, as we explain next.
Since ϕ(t) is peaked at t ≈ 1, we can Taylor expand the term 2n/t in Eq. (115) around this point. The leading and sub-leading contributions to S min come from the first two terms in this expansion. That is, from the linear approximation: 2n/t ≈ 2n + 2n(1 − t). Within this approximation the equation of motion becomes and we relax the boundary condition ϕ(0) = 0 by requiring only ϕ(t) to vanish as t → −∞. This may seem unnatural at first, but it will become immediately apparent that the solution to this well-posed Sturm-Liouville eigenvalue problem vanishes exponentially with n if t ≤ 0 (in particular ϕ(0) → 0 exponentially as n → ∞), which is enough to ensure that the resulting asymptotic expansion of S min in inverse powers of n will be correct. Such solution is: where Ai is the Airy function and the constant C is fixed by normalization. Imposing the second boundary condition, ϕ(1) = 0, we have (for the smallest eigenvalue) ω 2 = 2n − γ 1 (2n) 2/3 , where in this section γ 1 stands for the first zero of Ai(x), whose value is γ 1 ≈ −2.33811. Using (36), we obtain the minimum action from which (recall that here N = 2n) For the average of the error e 3 with which we estimate the three axes of the Cartesian frame (see Introduction), we obtain e 3 = 8/N − 8γ 1 /N 4/3 + O(N −5/3 ). These results are in complete agreement with those in [35]. The asymptotic series we have obtained turns out to be in powers of N −1/3 . To obtain accurate values of F * for moderately large N , the next term in (119), of order N −5/3 , might be important. Using our approach the calculation of this term is straightforward. One simply needs to include in (116) the next term in the Taylor expansion of 2n/t, i.e., 2n(1 − t) 2 , and use perturbation theory to obtain the correction The corresponding correction to S * can then be computed via Eq. (36). The result is δS * = 2 7/3 γ 2 1 /(15n 5/3 ). From this, the correction to the fidelity turns out to be δF * = 8γ 2 1 /(15N 5/3 ).

B. Limited Abstention
As in the previous examples, if the rate of abstention is fixed to a value strictly less than one the resulting precision very much depends on the given signal state, namely, on the shape of c j (or ψ). In order to give a concrete expression for the fidelity, here we will assume that, maybe because of some energy limitations, the probability amplitudes c j of exciting a state (e.g., of a Rydberg atom) with angular momentum j is a decreasing function. Let us further assume as a first approximation, and also for simplicity, that this decrease is linear: This simple example will allow us to illustrate the most characteristic features of frame estimation enhanced by abstention.
If no abstention is allowed (standard estimation), one can show that the averaged error [i.e., 8(1 − F )] vanishes as (1/N ) log N as N increases, much slower than using the optimal signal states. We will show that even a tiny amount of abstention is enough to turn this scaling into 1/N . Moreover, the coefficient in this scaling law can be reduced down to almost the minimum value in (119) with a finite amount of abstention.
For 0 < Q < 1 (λ > 1) and large n, the very same argument we used for large abstention shows that ϕ(t) will be peaked away from t = 0, at some value close to the boundary of the coincidence set. We can thus Taylor expand the term 2n/t in (115) around t = α to subleading order. The differential equation becomes whose solution in C c (where σ = 0) is Here, we have used the weaker boundary condition lim t→−∞ ϕ(t) = 0, and C is determined in terms of the remaining free parameters α and ω by imposing continuity at the boundary of the coincidence set: ϕ(α) = λψ(α).
This combined with continuity of the first derivative implies ϕ(α)/ϕ ′ (α) = ψ(α)/ψ ′ (α), thus By inspection, we see that in order for this expression to make sense for asymptotically large n, the Lagrange multiplier ω must be of the form with ǫ(n) = o(n 0 ) and γ ′ 1 being the first zero of the Ai ′ function (γ ′ 1 ≈ −1.0188). To compute ǫ(n), we assume it has an asymptotic series expansion in inverse powers of n 1/3 and plug it into (122). We then obtain the coefficients of the resulting series recursively. At leading order There is however an additional order n 1/3 contribution to ω 2 coming from the next (quadratic) order in the Taylor expansion of 2n/t in (115). It can be computed using perturbation theory. Namely, as [in this expression ϕ is assumed to be normalized to one in (−∞, α)]. Combining the two order n 1/3 contributions one has where ω 2 0 is defined in (123). This equation gives ω 2 as an explicit function of α.
The rate of abstention (equivalently, λ) can also be expressed as a function of α by imposing normalization to the solution of (120) in the whole interval (−∞, 1], i.e., 1 −∞ dt ϕ 2 (t) = 1. One has Using (36) once again, we obtain Eqs. (125) and (126), define the curve (Q, S min ) in terms of α, which we view as a free parameter that takes values in the range 0 < α < 1. This curve, which is accurate up to order n −4/3 , is plotted in Fig. 6 (dashed line) for n = 20. In the same figure, we also plot the asymptotic (leading) contribution alone (solid line) and some numerical optimization results for n = 20 (empty blue circles) and n = 90 (filled red circles). We see that n = 20 is still not quite in the asymptotic regime, and that the sub-leading corrections play a significant role, improving the agreement to almost perfect for central values of Q. At leading order, S min can be easily written as an explicitly function of Q, since only the leading term in (125) contributes and we have α = 1 −Q 1/3 . Substituting this in the first line of (126), we obtain Interestingly, the corrections to this result can be shown to be of order n −5/3 , whereas the implicit form given by (125) and (126) has non-zero contributions of order n −4/3 . In the limit Q → 1, Eq. (127) yields the leading order in (118), but the slope of S min (Q) becomes vertical at Q = 1 (see solid line in Fig. 6). At this point our asymptotic approximation breaks down -as can be seen by noticing that the higher order terms, e.g., Eq. (124), diverge as some negative power of 1−α ≈Q 1/3 -and the numerical results approach the leading asymptotic curve very slowly. This is apparent from Fig. 6, where an extra point (empty diamond), corresponding to a numerical result for n = 1000, has been added to further emphasize this behavior. At the other end, for Q → 0, Eq. (127) diverges. That should not come as a surprise, since, as mentioned above, for zero abstention the error scales as (1/n) log n. This also explains why the agreement with the numerical results (circles) in Fig. 6 worsens as Q becomes very small.

VI. CONCLUSIONS
We have studied the effect of abstention, or postselection, in parameter estimation. In some cases, such as that of N parallel spins encoding a spatial direction, abstention does not provide any enhancement of the estimation precision. However, generically post-selection do have a significant effect, even asymptotically.
The problem of finding the optimal protocol with abstention can be rephrased as that of optimizing the probabilistic map that transforms the family of input states into a new family that yields a higher estimation fidelity. The optimization is first formulated as a SDP problem, which immediately renders it numerically solvable. Most importantly, we have also presented a method for computing the fidelity and the form of the transformed states as a function of the abstention rate Q for asymptotically large samples. This method relies on mapping our optimization problem to a mechanical problem defined through an effective Lagrangian (action) where the input state plays the role of a moving constraint. Solving the corresponding equations of motion returns the optimal fidelity for a fixed abstention rate Q, and the corresponding optimal POVM. We have given the general form of this Lagrangian for the relevant problems of phase, direction and Cartesian frame estimation, and thereby cleared the road for finding analytical optimal solutions for arbitrary input states. We would like to emphasize that this is a significant development, since even in the standard approach to estimation, without abstention, analytical asymptotic expressions were only known in few cases.
For phase and direction estimation we have illustrated our method for two types of input states. We have first studied states proportional to the (rotated) seed vector of the respective optimal POVM in standard parameter estimation, and then moved into product states of identically prepared qubits, polarized on the equatorial plane (phase), and into products of pairs of antiparallel spins (direction). The rate at which the fidelity approaches one establishes two distinct regimes: In the first regime the rate is proportional to N −1 (the so-called shot noise limit) and the abstention can change the proportionality constant up to a factor of two. This means that in a given setup an experimentalist would attain the same gain in fidelity by cranking up the number of copies than she would by allowing for some degree of abstention. The second regime is much more dramatic: the fidelity approaches one as N −2 , thus attaining the Heisenberg limit. The abstention rate that separates the two regimes depends on the input states under consideration. For input states proportional to the rotated POVM seeds, which have a very broad distribution in the relevant quantum number but provide a shot-noise limited fidelity in standard estimation (without abstention), the slightest abstention rate, Q > 0, is enough to unlock the good encoding properties of these states and reach the Heisenberg, N −2 , regime. Product states can also reach this enhanced regime, but in this case the abstention rate needs to get exponentially close to one. In contrast to the previous case studied in [25], where the action of the POVM can be understood as a filtering of subspaces preceding the optimal canonical measurement, here the POVM plays a more active role and modifies in a non-trivial way the coherences in the states. The benefits of abstention are also more visible here than in Ref. [25], where an exponentially small acceptance rate was required to change the coefficient of the shot noise term N −1 , and the Heisenberg regime was not attainable at all.
Cartesian frame estimation has been shown to be formally equivalent to phase estimation in the asymptotic regime of many spins, provided one can entangle the magnetic number m with the quantum number that labels the degeneracy of the total angular momentum representations. In addition, we have studied frame estimation with systems where no such degeneracy exists (such as Rydberg atoms) or cannot be exploited. The method is illustrated for a simple input state where the amplitudes of the different angular momentum eigenstates are linearly decreasing with j. In this case, even a tiny amount of abstention triggers a change in the averaged error scaling, from (1/N ) log N to 1/N , which is the fastest decrease one can attain in this scenario. Increasing the abstention rate further reduces the scaling-law coefficient down to almost its minimum value.
Recently [38], there has been revamped interest in weak measurements [36], with particular emphasis on quantum metrology [37,39]. The protocol of state estimation with abstention presented here and weak measurements are both instances of post-selection. Our framework does not assume any specific realisation of the measurements, therefore the bounds derived here also apply to a weak measurement set-up. Note, however, that most of the work on weak-measurement metrology follow a point-wise approach to estimation, as opposed to the Bayesian approach followed here (see however [39]). The analysis of abstention in a point-wise approach together with the important extension of our methods to mixed states will be presented in [28]. Finally, we note that very recently a similar use of abstention has been applied to other quantum processing tasks, such as quantum cloning (or replication) [40] achieving also enhanced efficiency.

ACKNOWLEDGMENTS
We acknowledge financial support from ERDF: European Regional Development Fund. This research was supported by the Spanish MICINN, through contract FIS2008-01236 and the Generalitat de Catalunya CIRIT, contract 2009SGR-0985. We thank Giulio Chiribella for inspiring discussions at the early stages of this work and Madalin Guta for pointing out its relation to weak measurements.