Sharp norm inequalities for commutators of classical operators

We prove several sharp weighted norm inequalities for commutators of classical operators in harmonic analysis. We find sufficient $A_p$-bump conditions on pairs of weights $(u,v)$ such that $[b,T]$, $b\in BMO$ and $T$ a singular integral operator (such as the Hilbert or Riesz transforms), maps $L^p(v)$ into $L^p(u)$. Because of the added degree of singularity, the commutators require a"double log bump"as opposed to that of singular integrals, which only require single log bumps. For the fractional integral operator $I_\al$ we find the sharp one-weight bound on $[b,I_\al]$, $b\in BMO$, in terms of the $A_{p,q}$ constant of the weight. We also prove sharp two-weight bounds for $[b,I_\al]$ analogous to those of singular integrals. We prove two-weight weak-type inequalities for $[b,T]$ and $[b,I_\al]$ for pairs of factored weights. Finally we construct several examples showing our bounds are sharp.


Introduction
Given a linear operator T defined on the set of measurable functions and a function b, we define the commutator [b, T ] to be the operator Commutators of singular integral operators were introduced by Coifman, Rochberg, and Weiss [11], who used them to extend the classical factorization theory of H p spaces. They proved that if b ∈ BMO, then [b, T ] is bounded on L p (R n ), 1 < p < ∞. Janson [27] later showed the converse: if [b, T ] is bounded, then b ∈ BMO.
The first author is supported by the Stewart-Dorwart Faculty Development Fund at Trinity College and grant MTM2009-08934 from the Spanish Ministry of Science and Innovation.
Given 0 < α < n, define the fractional integral operator I α by The commutator [b, I α ] was first considered by Chanillo [7], who showed that if b ∈ BMO, [b, I α ] maps L p (R n ) into L q (R n ), where 1/p − 1/q = α/n; a dyadic version of this result and further applications were given by Lacey [29].
While commutators share the same L p bounds as the underlying operators (e.g., singular integrals are bounded on L p and fractional integrals map L p into L q ), they are, nevertheless, more singular. This fact was first observed by considering their behavior at the endpoint. For instance, a singular integral operator T is bounded from L 1 (R n ) to L 1,∞ (R n ), but [b, T ], b ∈ BMO, is not. Instead, it satisfies a weaker modular inequality, where Φ(t) = t log(e + t). See Pérez [39]. A similar result holds for fractional integrals; see [13]. The greater degree of singularity of commutators is also reflected in the differences between the sharp weighted norm inequalities for a commutator and the underlying operator. This was first shown in a recent paper by Chung, Pérez, and Pereyra [10]. (See also Chung [8,9].) To state their result, recall that for 1 < p < ∞ we say that w is an A p weight (or, more simply, w ∈ A p ) if where the supremum is taken over all cubes Q with sides parallel to the coordinate axes. If T is a singular integral operator, then and this estimate is sharp in that max(1, p ′ /p) cannot be replaced by any smaller power. (This result has a long history and has only recently been proved in full generality. See [17,18,25,26] for details and further references.) However, Chung, Pérez, and Pereyra showed that if b ∈ BMO, then and this exponent is again sharp.
In this paper we continue the study of weighted norm inequalities for commutators. We prove two-weight, strong type norm inequalities for commutators of singular integrals and one and two-weight strong type norm inequalities for commutators of fractional integrals. In both cases the results we get are sharp, and (like the result of Chung, Pereyra and Pérez) they demonstrate that commutators are more singular than the underlying operators. We also consider two-weight, weak type inequalities for both operators and prove results for a special class of weights, the so-called factored weights (which we will define below). These results are of interest because they strongly suggest what the sharp results should be, and we make two conjectures.
Singular integrals. We first consider singular integral operators. Because of our approach, our proofs are restricted to singular integral operators that can be approximated by "dyadic" singular integral operators that are generalizations of the Haar shifts. (Precise definitions will be given in Section 2 below.) Such operators include the classical singular integrals: the Hilbert transform, Riesz transforms, and the Ahlfors-Beurling operator. In one dimension it also includes any convolution type singular integral whose kernel is C 2 : see Vagharshakyan [48]. However, in light of recent results [25,26] we conjecture that Theorem 1.3 below is true for any Calderón-Zygmund singular integral.
Before stating our result for commutators, we provide some context. It has long been known that the two-weight A p condition is not sufficient for two-weight norm inequalities for singular integrals: see Muckenhoupt and Wheeden [36]. An important substitute is the so-called A p -bump condition, where A, B are Young functions and · A,Q , · B,Q are normalized Luxemburg norms on the cubes Q. (Precise definitions are given below.) These conditions have been extensively studied: see [16,18,19,20,21,22]. Like the Muckenhoupt A p weights, these weight classes have two advantageous features. First, the A p -bump condition is "universal": it applies simultaneously to large families of operators. Second, it is straightforward to check that a given pair satisfies the condition or to construct a pair of weights that does or does not satisfy it.
For the class of singular integrals we are concerned with, the best result is the following. Theorem 1.1 ([16, 17, 18]). Given p, 1 < p < ∞, suppose (u, v) is a pair of weights such that Further, this result is sharp in the sense that if δ = 0, then it does not hold in general.
Theorem 1.1 was proved in [16] for the Hilbert transform, and was proved in general in [17,18].
Remark 1.2. Here and in subsequent theorems, our hypotheses can be stated in greater generality, replacing the "log-bumps" (as Young functions like A and B are generally called) by more general Young functions determined by the so-called B p condition; see Definition 2.6. However, for commutators it is most natural to state our results in this form. For a brief description of a more general formulation, see Remark 2.13 below.
We can now state our main result for commutators of singular integrals.
If T is any singular integral that can be approximated by dyadic singular integrals (in particular, if T is the Hilbert transform, a Riesz transform, or the Ahlfors-Beurling operator) and b ∈ BMO, then  [46]. Let then by Theorem 1.3 we have for some positive function ϕ. We now exploit the fact that a two weight norm inequality has two degrees of freedom: for any s, t > 0, Hence, if we substitute (u, v) → (su, tv) in inequality (1.4), we get which is the desired linear bound.
The higher degree of singularity of the commutators is reflected in the power on the logarithms in the definition of A and B: roughly twice as large as for a singular integral. (For this reason, we say that the commutator requires "double log bumps.") The phenomenon of having the degree of singularity reflected in the power of the logarithm was first conjectured in [19] for the dyadic square function and the vector-valued maximal operator, and confirmed in [18]. Theorem 1.3 generalizes a number of known results for commutators of singular integrals.Álvarez et al. [3] showed that if W is any class of weights that is stable-i.e., if (u, v) ∈ W , there exists r > 1 such that (u r , v r ) ∈ W -then given any pair (u, v) ∈ W , [b, T ] : L p (v) → L p (u). The main example of a class of stable weights consists of pairs (u, v) that satisfy (1.2) when A(t) = t rp and B(t) = t rp ′ , r > 1. This class has the remarkable property that given any such pair (u, v), there exists w ∈ A p such that c 1 u ≤ w ≤ c 2 v. See Neugebauer [37] (also see [19]). In [22], Theorem 1.3 was proved with A(t) = t rp , r > 1, B(t) = t p ′ log(e + t) 2p ′ −1+δ ; this was improved in [12] where it was proved with A(t) ≈ t p exp([log(t p )] r ), 0 < r < 1. Finally, in [16] Theorem 1.3 was proved with A(t) = t p log(e + t) 3p−1+δ .
In [16], the condition (1.2) was conjectured as being sufficient for the commutator of any Calderón-Zygmund singular integral operator, and Theorem 1.3 is substantial evidence for this conjecture. There were two motivations for this conjecture. First, it is a natural generalization of an old (and still outstanding) conjecture of Muckenhoupt and Wheeden. They conjectured that given a pair of weights (u, v), a sufficient condition for a singular integral to map L p (v) into L p (u) is that the Hardy-Littlewood maximal operator satisfy The maximal operator naturally associated with commutators is not M, but the Orlicz maximal operator M L log L (defined below); therefore, it seems natural to conjecture that if we replace M by M L log L in (1.5) then we get a sufficient condition for [b, T ] : L p (v) → L p (u). The bump condition (1.2) is sufficient for M L log L to satisfy these two estimates (this follows from Theorem 2.7 below). A second motivation for this conjecture is that for the special class of factored weights we could readily prove a result that was nearly optimal. We will consider this approach more carefully below.
Fractional integrals. We can prove both one and two-weight results for commutators of fractional integrals. In the one weight case the appropriate class of weights is A p,q , a generalization of the A p weights introduced by Muckenhoupt and Wheeden [35]. More precisely, given α, 0 < α < n, and p, 1 < p < n/α, fix q so that 1/p − 1/q = α/n.
There is a close connection between A p,q weights and A p weights: it is immediate from the definition that [w] Ap,q = [w q ] A 1+q/p ′ . If w ∈ A p,q , then I α : L p (w p ) → L q (w q ), and in [30] the sharp constant in this inequality was given: (A local version of this result was proved in [2].) Our next theorem is the corresponding result for commutators.
The restriction 1/p − 1/q = α/n in the one-weight case follows from homogeneity: see [19,Section 5.6]. However, in the two-weight case, since the weights u and v may have different homogeneity, there is no corresponding restriction. Pérez [38] proved that if 1 < p ≤ q < ∞, and if the pair (u, v) satisfies where A(t) = t q log(e + t) q−1+δ and B(t) = t p ′ log(e + t) p ′ −1+δ , then I α : L p (v) → L q (u). Given this estimate, our next result is the natural analog of Theorem 1.3 for commutators of fractional integrals.
Theorem 1.6. Given α, 0 < α < n, and p, q, 1 < p ≤ q < ∞, suppose the pair of weights (u, v) satisfies Further, this inequality is sharp since it does not hold in general if we take δ = 0 in the definition of A q .
As this paper was being completed, we discovered that the sufficiency of (1.8) in Theorem 1.6 was proved earlier by Li [33], who adapted the proof of the two-weight norm inequalities for I α . Here we give a somewhat more elementary proof along with an example to show that this condition is sharp.
Though not directly connected with our results on commutators, we digress to give a sharp constant result for the weighted Sobolev inequality. In [30] the authors used their results for fractional integrals to show that for p, q such that 1 ≤ p < n and 1/p − 1/q = 1/n, Here we show that this inequality is the best possible. Theorem 1.7. Suppose n > 1, 1 ≤ p < n and 1/p − 1/q = 1/n, then inequality (1.9) is sharp since the exponent 1/n ′ cannot be replaced by any smaller power.
To show that (1.9) is sharp we cannot use the standard examples of the form f (x) = |x| a χ B (x) where B is a unit ball or unit cube, since (1.9) requires f to be smooth. We instead introduce a new family which is smooth and decays exponentially at infinity.
Weak type inequalities. We begin with our two conjectures for weak type inequalities for commutators. Conjecture 1.8. Given a Calderón-Zygmund singular integral operator T , if for some p, 1 < p < ∞, the pair of weights (u, v) satisfies Conjecture 1.9. Given α, 0 < α < n, if for some p, 1 < p < ∞, the pair of weights (u, v) satisfies Conjecture 1.8 was proved in [21] when A(t) = t rp , r > 1; in [12] this was improved to A(t) ≈ t p exp([log(t p )] r ), 0 < r < 1. Conjecture 1.9 was proved by Liu and Lu [34], again when A(t) = t rp , r > 1; they did so by adapting the argument in [21] to the case of fractional integrals. By combining their proof with the ideas in [12], we get that this conjecture is also true with A(t) ≈ t p exp([log(t p )] r ), 0 < r < 1.
By comparison, a singular integral T satisfies T : L p (v) → L p,∞ (u) if the pair (u, v) satisfies (1.10) with A(t) = t p log(e + t) p−1+δ and B(t) = t p (see [20]), and it is conjectured that I α satisfies a weak (p, p) inequality if the pair (u, v) satisfies (1.12) with this same pair of Young functions. (See [19]. ) We cannot prove either conjecture; however, we can prove two results for a special class of weights that strongly suggests that these conjectures are true. We consider the so-called factored weights: pairs of the form , where M Φ and M Ψ are Orlicz maximal operators (which are defined in Section 2 below). Such pairs are a generalization of the pairs (u, Mu) that have appeared in many contexts. Their explicit structure can be combined with Calderón-Zygmund decomposition arguments to prove a variety of weighted norm inequalities. In addition, their factored form (which is in some sense a two-weight version of the Jones' factorization theorem) makes it straightforward to construct examples of pairs of weights that satisfy A p bump conditions. Factored weights were introduced and studied systematically in [19]. Theorem 1.10. Given a Calderón-Zygmund singular integral operator T and p, 1 < p < ∞, then for any pair of non-negative, locally integrable functions w 1 , w 2 , the pair of weights In the next result, M Φ,α and M Ψ,α are fractional Orlicz maximal operators; these will be defined in Section 2 below. Theorem 1.11. Given α, 0 < α < n, and p, 1 < p < ∞, then for any pair of non-negative, locally integrable functions w 1 , w 2 , the pair of weights 2p+δ , and B(t) = t p ′ log(e + t) p ′ , and for any b ∈ BMO the commutator [b, I α ] satisfies (1.13).
In both theorems the power of the logarithm on the function A is 2p + δ instead of the conjectured 2p − 1 + δ; we believe that this extra logarithm is not fundamental but rather is a consequence of the proof. The proof uses a two-weight inequality for the sharp maximal function M # which results in a loss of information. The proofs of Theorems 1.10 and 1.11 can be adapted to prove Theorems 1.3 and 1.6 for factored weights, but again in both cases we have to take A(t) = t p log(e+t) 2p+δ . (Details are left to the interested reader.) As we noted above, this result for factored weights was one motivation for initially conjecturing that Theorem 1.3 was true.
Organization. The remainder of this paper is organized as follows. In Section 2 we gather a number of definitions and results needed in our proofs. In Section 3 we estimate the local mean oscillation of the commutator of a dyadic singular integral, a key step in our proof of Theorem 1.3, which we give in Section 4. In Sections 5 and 6 we prove Theorems 1.5 and 1.6 for commutators of fractional integrals. In Section 7 we prove our weak type inequalities for factored weights.
And finally, in Section 8 we construct the examples which show that our results are sharp.
Throughout this paper, all notation is standard or will be defined as needed. We will denote by c a constant that generally depends only on the dimension, the operator under consideration and the value of p; the value of this constant, however, will often vary from line to line.

Preliminaries
We start with some basic facts and notation. By a weight we will mean a measurable, non-negative function that is positive on a set of positive measure. A pair of weights (u, v) will always consist of nonnegative, measurable functions such that: u > 0 on a set of positive measure, u < ∞ almost everywhere, v > 0 almost everywhere, and v < ∞ on a set of positive measure. Given p, 1 < p < ∞, p ′ will denote the dual exponent p/(p − 1). For 1 < p < ∞ and a weight w, L p (w) is the set of all measurable functions such that When w ≡ 1, we write L p (R n ).
Hereafter, Q will denote a cube. Let D be the set of all dyadic cubes in R n : i.e., cubes of the form 2 k (m + [0, 1) n ) where k ∈ Z and m ∈ Z n . For Q ∈ D, D(Q) is the set of all dyadic subcubes of Q. Given a dyadic cube Q ∈ D and an integer τ ≥ 0, Q τ will denote the unique dyadic cube containing Q such that |Q τ | = 2 τ n |Q|.
Given a set E, we will use two different notions of an "average" of a function f on the set E. Let a f (E) denote the mean value of f on the set E: Let m f (E) denote the median value of f on E: the (possibly nonunique) number such that 2.1. Dyadic operators. Below we will actually prove Theorems 1.3 and 1.6 for dyadic singular and fractional integral operators. Here we define these operators and show how they can be used to approximate their non-dyadic counterparts.
where h Q and g Q are functions that satisfy: Dyadic singular integrals are bounded on L 2 (R n ) and of weak type (1, 1). The L 2 (R n ) bounds follow from the Cotlar-Stein lemma and the weak (1, 1) inequality follows from the usual Calderón-Zygmund decomposition and the properties (ii) and (iv) above. (See [31].) The corresponding maximal truncated dyadic singular integral is defined by These operators also satisfy strong (2, 2) and weak (1, 1) inequalities (see [24]).
For r > 0 and β ∈ R, let rD β be the collection of cubes of the form r2 k (m + [β, β + 1) n ), where m ∈ Z n . Define the dyadic singular integral operator of order τ adapted to rD β by where h Q and g Q satisfy properties (i), (ii), (iii), and (iv) for cubes in rD β . The classical singular integral operators lie in the convex hull of the dyadic singular integral operators adapted to rD β . As a consequence we have the following approximation theorem. 23,43,44]). Given p, 1 < p < ∞, suppose T is the Hilbert transform, a Riesz transform, or the Ahlfors-Beurling operator. Then there exists τ ≥ 1 (depending on T ) and dyadic singular integral operators {T r,β } of order τ such that for all weights ν and functions f .
For example, the Hilbert transform can be approximated by dyadic singular integrals of order 2, the so called Haar shift operators. Hence, to obtain a bound on the norm of the Hilbert transform it suffices to bound the corresponding dyadic singular integrals T r,β with a constant independent of r and β. Below we will prove estimates only for the standard dyadic grid; it will be immediate that the same proofs yield bounds for dyadic singular integral operators adapted to any grid rD β .
To apply our results to more general singular integral operators, we would need to derive bounds on the dyadic singular integrals that were polynomial in the order τ . However, the constants we get are exponential in τ ; this is one of the obstacles that prevents us from obtaining bounds for general singular integral operators as in [25]. We will indicate the precise places where this occurs in Remarks 3.3 and 4.2 below. We do not know if our methods can be modified to obtain a polynomial dependence on the order τ .
The fractional integral operator is easier to approximate because its kernel is positive and locally integrable. Sawyer and Wheeden [47] introduced the dyadic fractional integral operator and proved it could be used to approximate I α .
Definition 2.3. Given α, 0 < α < n, define the dyadic fractional integral operator by To estimate I α we only need to average I d α over translations, τ t f = f ( · − t).

Young functions and Orlicz spaces.
We follow the terminology and notation of [19]. A function Φ is a Young function if Φ : [0, ∞) → [0, ∞) is continuous, convex and strictly increasing, Φ(0) = 0 and Φ(t)/t → ∞ as t → ∞. We will use the letters Φ, Ψ, . . . along with A, B, . . . to represent Young functions. The main examples we will be dealing with are Φ(t) = t r [log(e + t)] s for some r ≥ 1 and s ∈ R. (Hereafter we will write this more simply as t r log(e + t) s .) Given a Young function Φ, the associate functionΦ is the Young function defined byΦ The functions Φ andΦ satisfy Given two Young functions Φ, Ψ, we will use the notation Given a cube Q, define the normalized Luxemburg norm of f on Q by When Φ(t) = t r for some r > 1, then There is a generalized Hölder inequality for the Luxemburg norm.
Lemma 2.5. If Φ, Ψ, and Θ are Young functions such that In particular, for any Young function Φ, Given a Young function Φ define the associated maximal operator by There is also a dyadic version: For each α, 0 < α < n, define the associated fractional maximal operators by When Φ(t) = t log(e + t) we will replace the subscript Φ with L log L; when Φ(t) ≈ e t we will replace the subscript with exp L.
As we noted in the Introduction, Young functions play an important role in generalizing the A p condition to prove two-weight norm inequalities. Central to this are Young functions that satisfy the following growth condition.
Definition 2.6. For each p, 1 < p < ∞, a Young function Φ is said to belong to B p if for some c > 0, The next three results depend on the B p condition and will be used in the proofs of our main results. We start with a characterization of B p in terms of the Orlicz maximal function due to Pérez [40].
We next give sufficient, A p bump conditions for two-weight inequalities for the operators M Φ , T d , and T d * . Theorem 2.8 ( [40]). Given p, 1 < p < ∞, let Φ, Ψ, and Θ be Young functions such that Ψ ∈ B p and which satisfy then for every f ∈ L p (v), Theorem 2.9 ( [18]). Let T d be a dyadic singular integral operator of order τ , and let T d * be the associated maximal dyadic singular integral operator. Given p, 1 < p < ∞, and Young functions Φ, Ψ such that then for any f ∈ L p (v), and . The next two norm inequalities will also be used below. The first is due to Yano; for a proof, see Zygmund [50]. Theorem 2.10. Given a sub-linear operator S that is bounded on L p (R n ) for 1 < p ≤ p 0 , suppose that given any set Ω and f such that supp(f ) ⊂ Ω, It follows immediately from Marcinkiewicz interpolation that we can take S to be any operator that is bounded on L 2 (R n ) and is weak (1, 1).
The next result is a weak (p, p) inequality for M L log L,α . It was proved in [19,Proposition 5.16] for α = 0; the proof for α > 0 is essentially the same. For completeness we sketch the details.
Theorem 2.11. Given α, 0 ≤ α < n, and p, 1 < p < n/α, if the pair (u, v) satisfies Proof. By a variant of the Calderón-Zygmund decomposition for Orlicz maximal operators (see Pérez [40] and [13]), for each λ > 0 there exists a family of disjoint dyadic cubes Q λ j and a constant γ > 0 such that If Φ(t) = t log(e + t), then B −1 (t)t 1/p ≤ cΦ −1 (t). Therefore, by the generalized Hölder's inequality, Finally, we give some special Young functions that will be used in our proofs. First, if Φ(t) = t log(e + t), then a simple calculation shows thatΦ(t) ≈ e t . We will use this to apply the generalized Hölder's inequality.
In Theorems 1.3 and 1.6 our hypotheses are stated in terms of the Young functions where δ > 0. Closely related to these are the Young functions Lemma 2.12. Fix p, 1 < p < ∞, and let A, B, C, and D be as in (2.4),(2.5),(2.6) and (2.7). ThenB, D ∈ B p andĀ, C ∈ B p ′ , and so Furthermore, if we let Φ(t) = t log(e + t), then for t ≥ t 0 > 0, and so for all f, g,

Proof. Straightforward calculations show that
Similar calculations hold for B and D (just exchanging the roles of p and p ′ ). The desired conclusions now follows from Definition 2.6, Lemma 2.5, and Theorem 2.7.
Remark 2.13. Since they are the principal examples, we have stated our main results in terms of Young functions A and B which are log bumps (i.e., of the form (2.4), (2.5)). However, we can actually prove somewhat more general results. The key properties we need are those given in Lemma 2.12. Given a Young function A, we will say that C is its L log L associate if where Φ(t) = t log(e + t). Then we can restate the hypotheses of Theorem 1.3 as follows: Given a Young function A with L log L associate C ∈ B p ′ , and a Young function B with L log L associate D ∈ B p , if the pair (u, v) satisfies (1.2), then (1.3) holds. The hypotheses of Theorem 1.6 may be reformulated similarly. Details are left to the interested reader. Our proofs of the weak type results in Theorems 1.10 and 1.11, however, only work for log bumps.
Below we will need that BMO functions satisfy exponential integrability conditions; this is a consequence of the John-Nirenberg Theorem.
Theorem 2.14. Given b ∈ BMO, there exists a constant c n such that for every cube Q, In particular, A proof of inequality (2.8) is in Journé [28]. Inequality (2.9) is an immediate consequence of (2.8) and the definition of the Luxemburg norm.

Estimates on the local mean oscillation of [b, T d ]
In this section we state a decomposition theorem due to Lerner [32] and make the estimate we need to apply it to commutators of dyadic singular integrals. We begin by recalling a few facts. Given a cube Q and λ, 0 < λ < 1, define the local mean oscillation of f on Q by Define the dyadic local sharp maximal function on a fixed dyadic cube Q by By the properties of rearrangements, for all p > 0, Given a dyadic cube Q,Q will be its dyadic parent: the unique dyadic cube of twice the side length of Q that contains Q.
We make one observation which will be used heavily in what follows. In general the sets {Q k j } are only pairwise disjoint for a fixed k. However, if we define E k j = Q k j \Ω k+1 , then the sets {E k j } are pairwise disjoint for all j, k and satisfy |E k j | ≤ |Q k j | ≤ 2|E k j |. To apply Theorem 3.1 we need to estimate the local mean oscillation of [b, T d ].
Lemma 3.2. Suppose T d is a dyadic singular integral of order τ , Q is a dyadic cube and 0 < λ ≤ 1/2. Then there exists c = c(n, τ, λ) such that for any f and every x ∈ Q, . Proof. We will prove (3.3); (3.4) follows at once from the definition of M ♯,d λ,Q . Fix a dyadic cube Q and decompose T d as . Furthermore, it is a dyadic singular integral operator and so is bounded on L 2 (R n ) and weak (1, 1). The second term To estimate the commutator, we rewrite it as The last term is constant on Q, so let We estimate each piece in turn. By inequality (3.2), the weak (1,1) boundedness of T d in and the exponential integrability of BMO functions (Theorem 2.14), we obtain To estimate H 2 we use (3.2) with p = 1/2 and Hölder's inequality to get In the last inequality we used Yano's theorem (Theorem 2.10); this is possible since T d in is bounded on L 2 and weak (1, 1). Finally we estimate H 3 : by (3.2) and (3.5) we have that For the proof of Theorem 1.3 we will need the following estimate. A similar inequality was proved in [16].
where A and B are defined by (2.4) and (2.5). Then for f ∈ L p (v) and Proof. By a standard density argument we may assume f, h are nonnegative functions in L ∞ c . Set a = 4 n and let w = u 1/p h. For each j, k ∈ Z define For l, m ∈ Z let {P l r } r be the Calderón-Zygmund cubes of w at height a l and {Q m s } s be the Calderón-Zygmund cubes of f at height a m (see [19,38]); then a w (P l r ) ≈ a l , a f (Q m s ) ≈ a m , and We then have that We can now estimate as follows: We first estimate I 1 . Let the Young function C is as in equation (2.6) and we have used the generalized Hölder inequality (Lemma 2.5) and Yano's theorem (Theorem 2.10) in the second to last inequality. By Lemma 2.12, MB and M C are bounded on L p (R n ) and L p ′ (R n ) respectively. Hence, by (4.1) and Hölder's inequality with respect to the summation, The estimate for I 2 is similar. Since E r,s j,k ⊆ Q j s P k−j−1 r for (j, k, r, s) ∈ Γ 2 we have that D is as in (2.7) and we have once again used (4.1), Yano's theorem, and Lemma 2.12 for the boundedness of MĀ and M D on L p ′ (R n ) and L p (R n ).
Proof of Theorem 1.3. The first part of our argument is similar to one found in [18,Theorems 5.1,5.2]. Fix f ; by a standard approximation argument we may assume without loss of generality that f ∈ L ∞ c . Let R n j , 1 ≤ j ≤ 2 n , denote the the n-dimensional quadrants in R n : i.e., the sets R ± × R ± × · · · × R ± where R + = [0, ∞) and R − = (−∞, 0). For each j, 1 ≤ j ≤ 2 n , and for each N > 0 let Q N,j be the dyadic cube adjacent to the origin of side length 2 N that is contained in R n j . Since T d is weak (1, 1) and strong (2,2), by interpolation and duality it is bounded on L p (R n ), 1 < p < ∞. Therefore, since |m f (Q)| ≤ (f χ Q ) * (|Q|/2) (see [32]), by inequality (3.2), m [b,T d ]f (Q N,j ) → 0 as N → ∞. Therefore, by Fatou's lemma and Minkowski's inequality, Hence, it will suffice to prove that each term in the sum on the right is bounded by c f L p (v) where c is independent of N. Further, by duality, it will suffice to show that for any h ∈ L p ′ , h p ′ = 1, Fix j and let Q N = Q N,j . By Theorem 3.1 and Lemma 3.2 we have the following pointwise estimate: We first note that J 1 and J 2 are bounded by f L p (v) , since the pair (u, v) satisfies the conditions for the two-weight norm inequalities for the operators M d L log L and T d * . More precisely, by Hölder's inequality and Theorem 2.8 we have that Similarly, by Theorem 2.9, Let E k j = Q k j \Ω k+1 so that the sets {E k j } are pairwise disjoint and satsify |E k j | ≈ |Q k j | (see the comment following Theorem 3.1). We now estimate J 3 : where D is from (2.7). By Lemma 2.12 M D : L p (R n ) → L p (R n ) and MĀ : L p ′ (R n ) → L p ′ (R n ). Hence, by (1.2), Finally we estimate J 4 . We have To estimate the right-hand term, we apply a reduction argument very similar to the one given above to show that it will suffice to prove (For the details of this reduction for maximal dyadic singular integrals, see [18, Theorem 6.1].) To prove (4.2) we again use the Lerner decomposition argument. As was shown in [18], we have that Therefore, by Theorem 3.1, (Note that the families of cubes {Q k j } and {P k j } = {(Q k j ) τ } are different from the families in the first part of the proof.) To estimate J 5 we use Lemma 4.1 to get To estimate J 6 , we argue as follows: For J 7 we argue as we did in the estimate for J 3 above to get To estimate J 8 , first note that Hence, where the last inequality follows from Lemma 4.1.
Remark 4.2. The second point at which we pick up exponential dependence on τ is in the estimates of J 3 and J 7 above. In order to use (1.1) to estimate we have to replace Q k j by P k j in the first term. Since |P k j | = 2 n(τ +1) |Q k j |, by the homogeneity of the norm we can do so at the cost of a constant 2 n(τ +1)/p (see [19,Section 5.2]).

Proof of Theorem 1.5
Our proof is similar to that for commutators of singular integrals in [10]. By the sharp, off-diagonal extrapolation theorem in [30], it suffices to prove (1.7) in the particular case It follows at once that in this case, q = p ′ . By a standard approximation argument, we may assume f ∈ C ∞ c (R n ). Given this assumption, we can represent the commutator using the Cauchy integral formula: for all ǫ > 0, (See [3,11].) Fix w ∈ A p,p ′ ; then by Minkowski's integral inequality we have that We now estimate Since q = p ′ , it follows from the definitions that since w ∈ A p,p ′ , then Therefore, both w p ′ and w −p ′ satisfy the reverse Hölder inequality. In particular, by the sharp reverse Hölder inequality in [42,Lemma 8.1] (see also [10,Lemma 2.3]), for every cube Q, If we first apply Hölder's inequality with this exponent to the two integrals in the definition of A p,p ′ and then apply the reverse Hölder inequality, we get that Then it follows from Theorem 2.14 (see [10,Lemma 2.2]) that e r ′ Re ζ b ∈ A p,p ′ and [e r ′ Re ζ b ] A p,p ′ ≤ c n , (where c n is the constant in Theorem 2.14). Hence, if we combine this estimate with the sharp inequality for the fractional integral operator (1.6), we get . This inequality together with (5.1) then yields This completes the proof.
6. Proof of Theorem 1.6 By duality, it will suffice to prove that for all f ∈ L p (v) and all h ∈ L q ′ (R n ), h q ′ = 1, By a standard approximation argument we may assume f, h ∈ L ∞ c . Further, since I d α is a positive operator, we may assume f and h are non-negative.
Fix f and h. Then We will estimate K 1 ; the estimate for K 2 is gotten in the same way, exchanging the roles of f and u 1/p h. By Hölder's inequality (Lemma 2.5) and the exponential integrability of BMO functions (Theorem 2.14), we have that By an argument in [15] (see also Pérez [38] ) we may replace the sum over all dyadic cubes with the sum over the Calderón-Zygmund cubes of f . More precisely, for each k ∈ Z, let {Q k j } be the set of disjoint maximal dyadic cubes such that where a = 4 n . Let E k j = Q k j \Ω k+1 ; then the sets E k j are pairwise disjoint for all j and k, and |E k j | ≥ 1 2 |Q k j |. Then Define the Young function C q by By the same argument as in the proof of Lemma 2.12, we have that C q ∈ B q ′ and A −1 where Φ(t) = t log(e + t). Further, by the same lemma,B ∈ B p . Therefore, by the generalized Hölder's inequality (Lemma 2.5), (1.8), and Hölder's inequality with respect to the summation, Since p ≤ q, p ′ /q ′ ≥ 1. Therefore, by convexity and Theorem 2.7, This completes the proof.
Proof. By the definition of the Orlicz maximal operators and the Luxemburg norm, In exactly the same way we have that The desired conclusion follows at once.
• ( [13]) For 0 < α < n, 0 < δ ≤ 1, and b ∈ BMO, . If we combine inequalities (7.3) and (7.6), we get another sharp function inequality. Fix 0 < δ < ǫ < 1, and let σ = δ/ǫ < 1. Then Proof of Theorem 1.10. Recall that Φ(t) = t log(e + t) p+δ . Let Φ 0 (t) = t log(e + t) p−1+δ/2 and Φ 1 (t) = t log(e + t) 2p−1+δ . Then by a result of Carozza and Passarelli di Napoli [6] (see also [19,Theorem 5.26]), we have that for any function h, and . Similarly, recall that Ψ(t) = t log(e + t) p ′ +1 . If we let Ψ 0 (t) = t log(e + t) p ′ and Ψ 1 (t) = t log(e + t) p ′ −1 , then By Lemma 7.1, the pair with A(t) = t p log(e+t) p−1+δ/2 and B(t) = t p ′ . Therefore, by Lemma 7.2 and (7.4), We estimate each of the final terms separately. By Lemma 7.1 the pair again satisfies (7.1) with A(t) = t p log(e + t) p−1+δ/2 and B(t) = t p ′ . Similarly, the pair In particular, this pair satisfies the two-weight A p condition. Therefore, by Lemma 7.2 and (7.7), and by the two-weight, weak (p, p) inequality for the maximal operator, The estimate for the second term is simpler. By Lemma 7.1, the pair satisfies (7.1) with A(t) = t p and B(t) = t p ′ log(e + t) p ′ . Therefore, by Lemma 2.11, Proof of Theorem 1.11. The proof is nearly the same as the proof of Theorem 1.10, except that instead of having to introduce the supplementary maximal operators M Ψ 0 and M Ψ 1 , we use the fact that M Ψ,α w 2 ∈ A 1 (see [ . We show that we may not take δ = 0 in Theorem 1.6 when p = q = k for a positive integer k, 1 < k < n/α. In fact, We construct a pair of weights (u, v) satisfying (1.8), a function f , and a BMO function b, such that that the weak type inequality does not hold for any constant C > 0. Our example is similar to the example for the Hilbert transform given above. Let Φ k (t) = t log(t + e) 2k−1 and consider the pair (u, v) = (u, M Φ k ,kα u). The proof of Lemma 7.1 can be easily modified to show that (u, v) satisfies (1.8) with A(t) = t p log(e+ t) 2p−1 , B(t) = t p ′ log(e+ t) 2p ′ −1+δ , δ > 0. (In fact, we can take B to be any Young function.) To work with this pair, we express v = M Φ k ,kα u in a different way. By an inequality of Stein (see Wilson [49,Chapter 10]), where M j is the composition of M with itself j times. It follows that On the other hand we have (see [19,Example 5.42 Define the function f by f (x) = χ R n \B(0,e e e ) (x) |x| α (log |x|) 2 log log |x| .
Finally, let b(x) = log |x|; then for |x| > e e e we have  7) is sharp in the sense that the exponent (2 − α/n) max(1, p ′ /q) cannot be replaced by any smaller power. It will suffice to prove this assuming that p ′ /q ≥ 1; the case when p ′ /q < 1 follows at once by duality, using the fact that the commutator is essentially self-adjoint (i.e., [b, I α ] * = −[b, I α ]) and the fact that if w ∈ A p,q , then w −1 ∈ A q ′ ,p ′ and [w −1 ] A q ′ ,p ′ = [w] p ′ /q Ap,q . For each δ ∈ (0, 1), define the weight w δ (x) = |x| (n−δ)/p ′ and the power functions f δ (x) = |x| δ−n χ B(0,1) (x). A straightforward computation show that f δ L p (w p δ ) ≈ δ −1/p . Further, we have that [w δ ] Ap,q ≈ δ −q/p ′ Since w δ is a radial function, it suffices to check this for balls centered at the origin, again a straightforward computation.
Let b be the BMO function b(x) = log |x|. We estimate the commutator as follows. For x ∈ R n , |x| ≥ 2, Integrating this inequality, and using the fact that 1/p − 1/q = α/n and p ′ /q ≥ 1, we get that |x| (n−δ)q/p ′ |x| (n−α)q dx Ap,q f δ L p (w p δ ) . Since this is true for every δ > 0, it follows that we cannot take any smaller exponent in (1.7). 8.4. Sharp weighted Sobolev inequality. We will show that the power 1/n ′ is sharp in (1.9). Unlike the previous example, since we are dealing with regular functions we have to replace the cut-off function χ B(0,1) with a smooth function that has exponential decay.