Weighted norm inequalities for the bilinear maximal operator on variable Lebesgue spaces

We extend the theory of weighted norm inequalities on variable Lebesgue spaces to the case of bilinear operators. We introduce a bilinear version of the variable $\A_\pp$ condition, and show that it is necessary and sufficient for the bilinear maximal operator to satisfy a weighted norm inequality. Our work generalizes the linear results of the first author, Fiorenza and Neugebauer~\cite{dcu-f-nPreprint2010} in the variable Lebesgue spaces and the bilinear results of Lerner {\em et al.}~\cite{MR2483720} in the classical Lebesgue spaces. As an application we prove weighted norm inequalities for bilinear singular integral operators in the variable Lebesgue spaces.


I
In this paper we develop the theory of bilinear weighted norm inequalities in the variable Lebesgue spaces. To put our results in context we will first describe some previous results; for brevity, we will defer the majority of definitions until below. The Hardy-Littlewood maximal operator is defined by where the supremum is taken over all cubes in R n with sides parallel to the coordinate axes. The now classical result of Muckenhoupt [22] is that a necessary and sufficient condition for M to be bounded on the weighted Lebesgue space L p (w), 1 < p < ∞, i.e., that R n (Mf ) p w dx where again the supremum is taken over all cubes in R n with sides parallel to the coordinate axes.
This result has been generalized in two directions. First, Lerner, et al. [21], as part of the theory of weighted norm inequalities for bilinear Calderón-Zygmund singular integrals, introduced the bilinear (more properly, "bisublinear") maximal operator: It is immediate that M(f 1 , f 2 )(x) ≤ Mf 1 (x)Mf 2 (x), and so by Hölder's inequality, where 1 < p 1 , p 2 < ∞, 1 p = 1 p 1 + 1 p 2 , w j ∈ A p j , j = 1, 2, and w = w 2 . However, while this condition is sufficient, it is not necessary. In [21] they introduced the class A p of vector weights defined as follows. With the previous definitions, let p = (p, p 1 , p 2 ) and let w = (w, w 1 , w 2 ).
They proved that a necessary and sufficient condition for inequality (1.2) to hold is that w ∈ A p . If w j ∈ A p j , then w ∈ A p , but they gave examples to show that the class A p is strictly larger than the weights gotten from A p 1 × A p 2 . A second generalization of Muckenhoupt's result is to the setting of the variable Lebesgue spaces. The first author, Fiorenza and Neugebauer [6] proved that given an exponent function p(·) : R n → [1, ∞) such that 1 < p − ≤ p + < ∞ and p(·) is log-Hölder continuous, then the maximal operator is bounded on L p(·) . In [7] (see also [4]) they proved the corresponding weighted norm inequality: a necessary and sufficient condition for the maximal operator to be bounded on L p(·) (w), i.e., that (Mf )w p(·) f w p(·) , is that w ∈ A p(·) , When p(·) = p is a constant function, then this reduces to the classical result of Muckenhoupt, since L p(·) (w) = L p (w p ) and w ∈ A p(·) is equivalent to w p ∈ A p . The purpose of this paper is to extend both of these results and characterize the class of weights necessary and sufficient for the bilinear maximal operator to satisfy bilinear weighted norm inequalities over the variable Lebesgue spaces. The remainder of this paper is organized as follows. In Section 2 we make the necessary definitions to state our two main results; in particular, we introduce the class of vector weights A p(·) . Our first result, Theorem 2.4, is for the bilinear maximal operator. Our second, Theorem 2.8, shows that the weight condition A p(·) is sufficient for bilinear Calderón-Zygmund singular integral operators to satisfy weighted norm inequalities over the variable Lebesgue spaces. This generalizes the main result of [21].
In Section 3 we gather some basic results about weights and the variable Lebesgue spaces that we need in our proof, and in Section 4 we prove some properties of A p(·) and A p(·) weights. In Section 5 we give a characterization of vector weights A p(·) in terms of averaging operators. In Section 6 we prove Theorem 2.4. The proof is broadly similar to the proof in the linear case given in [7], but there are many additional technical obstacles. Finally, in Section 7 we prove Theorem 2.8. The proof relies on an extrapolation theorem in the scale of weighted variable Lebesgue spaces proved in [13].
Throughout this paper, n will denote the dimension of the underlying space R n . A cube Q ⊂ R n will always have its sides parallel to the coordinate axes. Given a cube Q and a function f , we will denote averages as follows: 1 Similarly, if σ is a non-negative measure, we denote weighted averages by Constants will be denoted by C, c, etc. and their value may change from line to line, even in the same computation. If we need to emphasize the dependence of a constant on some parameter we will write, for instance, C(n). Given two positive quantities A and B, we will write A B if there is a constant c such that A ≤ cB. If A B and B A, then we write A ≈ B.

M
We first recall the definition of variable Lebesgue spaces. For more information, see [5]. Let P denote the collection of measurable functions p(·) : R n → [1, ∞] and P 0 the collection p(·) : R n → (0, ∞]. Given p(·) ∈ P 0 and a set E ⊂ R n , define For simplicity we will write p − = p − (R n ) and p + = p + (R n ). Given p(·) ∈ P we define the dual exponent p ′ (·) pointwise a.e. by with the convention that 1 ∞ = 0. The space L p(·) consists of all measurable functions f such that for some λ > 0, where Ω ∞ = {x ∈ R n : p(x) = ∞}. This becomes a quasi-Banach function space when equipped with the norm when p − ≥ 1 it is a Banach space. When p(·) = p, 0 < p < ∞, then L p(·) = L p with equality of norms. An exponent p(·) ∈ P 0 is said to be locally log-Hölder continuous, denoted by p(·) ∈ LH 0 , if there exists a constant C 0 such that p(·) is said to be log-Hölder continuous at infinity, denoted by .
Remark 2.1. For our main results we will assume p + < ∞. In this case Ω ∞ is empty and the definition of the norm is simpler. Moreover, in the definition of log-Hölder continuity, we can replace the left-hand sides by |p(x) − p(y)| and |p(x) − p ∞ |, respectively, requiring only new constants that depend on p + .

Remark 2.3.
If p 1 (·) and p 2 (·) are constant, then this condition reduces to the A p condition for the triple (w p 1 1 , w p 2 2 , (w 1 w 2 ) p ).
We can now state our first main result.
Remark 2.5. We do not believe that the assumption p(·) ∈ LH is necessary in Theorem 2.4. In the linear, unweighted case, while it is sufficient to assume that the exponent p(·) is log-Hölder continuous for the maximal operator to be bounded on L p(·) , it is not necessary: see [5] for examples. Diening and Hästö [16] conjectured that in the weighted case, a necessary and sufficient condition for M to be bounded on L p(·) (w) is that the maximal operator is bounded on L p(·) and w ∈ A p(·) . We conjecture that the analogous result holds in the bilinear case: M satisfies (2.2) if and only if M satisfies an unweighted bilinear estimate and w ∈ A p(·) .
Remark 2.6. In the linear case, the maximal operator is bounded on L p(·) if p − > 1 and 1/p(·) ∈ LH: we can allow p + = ∞. (See [5] for details and references.) In [7] it was conjectured that M is bounded on L p(·) (w) with the same hypotheses if w ∈ A p(·) . This condition is well defined even if p − = 1 and p + = ∞. Moreover, this conjecture is true if p(·) = ∞ a.e. This is equivalent to a classical but often overlooked result of Muckenhoupt [22], that if w −1 ∈ A 1 and f w ∈ L ∞ , then (Mf )w ∈ L ∞ . Here we conjecture that we can remove the hypothesis p + < ∞ from Theorem 2.4. However, as in the linear case we believe that this will require a very different argument, as the fact that p + , (p j ) + < ∞ plays an important role in our proof.

Remark 2.7.
Though we have only proved our result in the bilinear case, an m-linear version of Theorem 2.4, m ≥ 3, should be true with the obvious changes in the definition of A p(·) and the statement of the theorem. But even in the bilinear case the proof is quite technical, and so we decided to avoid making our proof even more obscure by trying to prove the general result.
Our second main result is for bilinear Calderón-Zygmund singular integrals. These operators have been considered by a number of authors, and we refer the reader to [21] for details and further references.
Let K(x, y, z) be a complex-valued, locally integrable function on . We also assume that the two analogous difference estimates with respect to the variables y and z hold. An operator T : S × S → S ′ , is a bilinear Calderón-Zygmund singular integral if: (1) there exists a bilinear Calderón-Zygmund kernel K such that (2) there exist 1 ≤ p, q < ∞ and r such that 1 r = 1 p + 1 q and T can be extended to a bounded operator from L p × L q into L r . Theorem 2.8. Given p 1 (·), p 2 (·) ∈ P, suppose 1 < (p j ) − ≤ (p j ) + < ∞ and p j (·) ∈ LH, j = 1, 2. Define p(·) by (2.1). Let w 1 , w 2 be weights, define w = w 1 w 2 , and assume w ∈ A p(·) . If T is a bilinear Calderón-Zygmund singular integral, then

Remark 2.9.
As for the bilinear maximal operator, we do not believe that the assumption that p 1 (·), p 2 (·) ∈ LH is necessary for the conclusion in Theorem 2.8 to hold. In [11], the authors proved that in the unweighted case it was sufficient to assume that the (linear) maximal operator is bounded on L p 1 (·) and L p 2 (·) . We conjecture that with this hypothesis, or even the weaker assumption that M satisfies the associated unweighted bilinear inequality, and w ∈ A p(·) , then (2.3) holds.
Remark 2.10. Alongside the variable Lebesgue spaces there is a theory of variable Hardy spaces: see [12]. Very recently, the first author, Moen and Nguyen [10] proved unweighted estimates on variable Hardy spaces for bilinear Calderón-Zygmund singular integrals. It would be interesting to extend these results to weighted variable Hardy spaces using the A p(·) weights.

P
In this section we gather some basic results about weights and about variable Lebesgue spaces that we will need in the subsequent sections.
To state our next result, we introduce the weighted dyadic maximal operator. Given a weight σ, where the supremum is taken over all cubes in the collection D 0 of dyadic cubes: The following result is well-known but an explicit proof does not seem to have appeared in the literature. The proof is essentially the same as for the classical dyadic maximal operator: see Grafakos [18]. Variable Lebesgue spaces. Here we gather some basic results about variable Lebesgue spaces. All of these are found in the literature (with some minor variations). In some cases they were only proved for exponents p(·) ∈ P, but essentially the same proof works for p(·) ∈ P 0 .
The implicit constant depends only on the p j (·).
Lemma 3.11. [5, Lemma 3.24] Given p(·) ∈ P 0 , 0 < p − ≤ p + < ∞, then for every cube Q, |Q| p − (Q)−p + (Q) 1, and the implicit constant depends only on p(·) and n. The same inequality holds if we replace one of p + (Q) or p − (Q) by p(x) for any x ∈ Q. Remark 3.12. Lemma 3.11 is sometimes referred to as Diening's condition, and it is the principal way in which we will apply the LH 0 condition. Then given any set G and any non-negative measure µ, for every t ≥ 1 there exists a constant C = C(t, C 0 ) such that for all functions f such that |f (y)| ≤ 1, Q |f (y)| s(y) dµ(y) ≤ C G |f (y)| r(y) dµ(y) + G 1 (e + |y|) tnr − (G) dµ(y).

If we instead assume that
then the same inequality holds for any function f . Remark 3.14. Lemma 3.13 is the principal way in which we will apply the LH ∞ condition.
In this section we give some properties of the A p(·) and A p(·) weights that will be used in the proof of Theorem 2.4. For simplicity, throughout this section, assume that w 1 , w 2 are weights and let w = w 1 w 2 and w = (w 1 , w 2 , w). Similarly, whenever we are given p 1 (·), p 2 (·) ∈ P, define p(·) by (2.1) and let p(·) = (p 1 (·), p 2 (·), p(·)). Note that in this case we always have that p − ≥ 1 2 . We begin by recalling the definition of A p(·) weights and then state several results from [7] on their properties. Definition 4.1. Given p(·) ∈ P and a weight w, we say w ∈ A p(·) if The next two lemmas show the relationship between A p(·) and A ∞ weights.
the implicit constant depends only on p(·) and w.
The final lemma is an integral estimate that, in conjunction with Lemma 3.13 will be used to apply the LH ∞ condition. Lemma 4.6. [6,Inequality (3.3)] Given p(·) ∈ P, suppose p(·) ∈ LH. If w ∈ A p(·) , then there exists a constant t > 1, depending only on w, p(·) and n, such that We now turn to the A p(·) condition. If w 1 ∈ A p 1 (·) and w 2 ∈ A p 2 (·) , then by Lemma 3.6 we have that w ∈ A p(·) . However, this inclusion is proper, since it is in the constant exponent case. Nevertheless, we can characterize the bilinear A p(·) weights in terms of the A p(·) condition. In the constant exponent case this is proved in [21], and our argument is adapted from theirs.

Proposition 4.7. Given w and p(·), w ∈ A p(·) if and only if
First assume that w ∈ A p(·) . Then for a.e. x, . Therefore, by Lemmas 3.7 and 3.3, and by the definition of A p(·) , Thus (4.1) holds.
Conversely, now suppose that (4.1) holds. Then for a.e. x, so by Lemmas 3.6 and 3.3, for any cube Q, Therefore, and so w ∈ A p(·) .
Proposition 4.7 has the following corollary which will be used in our proof of Theorem 2.4. Corollary 4.9. Given p 1 (·), p 2 (·) ∈ P, suppose p j (·) ∈ LH and (p j ) Proof. This follows immediately from Lemma 4.2 and Proposition 4.7.
Remark 4.11. When we apply Proposition 4.10 below, we will do so in conjunction with Proposition 4.7 to w 1 2 ∈ A 2p(·) , so we will let v −1 = w − 1 2 and replace p(·) by 2p(·) and q by 2q. We could have stated this result in these terms, but for the purposes of the proof, it seemed easier to suppress the factor of 2.
Proof. Fix a cube Q ⊂ R n . It follows from the definition that for a.e.
Let Q 0 be the cube centered at the origin with |Q 0 | = 1. Then either |Q| ≤ |Q 0 | or |Q| > |Q 0 |. we will prove (4.2) in the first case; the proof of the second case is the same, exchanging the roles of Q and Q 0 . Suppose first that dist(Q, so there exists a constant C = C(p 1 (·), p 2 (·)) such that . Therefore, by Lemma 3.6 and the A p ′ (·) condition we have that . Hence, by (4.4) and Lemma 3.11, Therefore, arguing as we did in inequality (4.5), replacing 5Q 0 byQ. we get If we continue the above argument and use the fact To estimate this final term, note that since p j (·) ∈ LH, there exist Therefore, again by log-Hölder continuity, and using that 1 .
Given this, and since |Q| (e + d Q ) n , we therefore have that 1.

C A p(·)
In this section we give two characterizations of the A p(·) condition in terms of averaging operators. The first is a very general condition that does not require assuming that the exponent functions are log-Hölder continuous; the second requires the additional assumption that p 1 (·), p 2 (·) are log-Hölder continuous.
Given Q a cube, define the multilinear averaging operator A Q by More generally, given a family Q = {Q} of disjoint cubes, we define Theorem 5.1. Given p 1 (·), p 2 (·) ∈ P and w, then w ∈ A p(·) if and only if where the supremum is taken over all cubes Q. If we assume further that p 1 (·), p 2 (·) ∈ LH, then w ∈ A p(·) if and only if where the supremum is taken over all collections Q of disjoint cubes.

Remark 5.2.
When p − ≥ 1 (i.e., when L p(·) is a Banach space) the characterization in terms of the operators T Q is a consequence of a general result in the setting of Banach lattices due to Kokilashvili, et al. [20]. However, even in this special case we would be required to show that condition G defined below holds in order to apply their result. In our case we can use the rescaling properties of variable Lebesgue spaces to prove it directly.

Remark 5.3.
A very deep result in the theory of variable Lebesgue spaces is that the uniform boundedness of the linear version of the averaging operators T Q is equivalent to the boundedness of the Hardy-Littlewood maximal operator, but the uniform boundedness of the (linear) operators A Q is not. See [5,15] for details and further references. We conjecture that the corresponding result holds in the bilinear case.
The proof of Theorem 5.1 is straightforward for A Q , and so we give this proof separately.
Proof of Theorem 5.1 for A Q . Let be w ∈ A p(·) ; then given any cube Q, by Lemma 3.6 and the definition of A p(·) we get Since the implicit constant depends only on the A p(·) condition and is independent of Q, we get (5.1). Now assume that (5.1) holds. By Lemma 3.8, there exist h j w j ∈ L p j (·) , h j w j p j (·) ≤ 1, j = 1, 2, such that Again, the constant is independent of Q, so w ∈ A p(·) .
The proof of Theorem 5.1 for T Q requires two ancillary tools. The first is a bilinear averaging operator that generalizes a linear operator introduced in [15]. Given p(·) ∈ P, define the p(·)-average and given a disjoint family of cubes Q define the p(·)-averaging operator In [15,Corollary 7.3.21] they showed that if p(·) ∈ LH, then We define the bilinear p(·)-average operator analogously: given p 1 (·), p 2 (·) and a family of disjoint cubes Q, let where the supremum is taken over all collections Q of disjoint cubes.
Proof. Since p 1 (·), p 2 (·) ∈ LH, p(·) ∈ LH, and so by Lemma 3.10, Therefore, and so by Lemma 3.6 and (5.3), The second tool is a summation property. Given p 1 (·), p 2 (·) ∈ P suppose p(·) is such that p − ≥ 1. Then we say that p(·) ∈ G if for every family of disjoint cubes Q, where the implicit constant is independent of Q.
Remark 5.5. The linear version of property G is due to Berezhnoȋ [1] in the setting of Banach function spaces. See also [15] where it is used to prove (5.2).
We first prove the necessity of the A p(·) condition. This is an immediate consequence of Theorem 5.1. Given any cube Q, we have that . Therefore, given weights w 1 , w 2 such that inequality (2.2) holds, we immediately have that (5.1) holds, and so w ∈ A p(·) . Remark 6.1. The proof that the A p(·) condition is necessary does not require us to assume that the exponents are log-Hölder continuous.
We begin with a series of reductions. First, for t ∈ {0, 1/3} n , define Each D t is a "1/3" translate of the standard dyadic grid, and has exactly the same properties as D 0 defined above. (Note that the two definitions agree when t = 0.) Define the dyadic bilinear maximal operator Then we have the following remarkable inequality: there exists a constant C(n) such that This was first proved in [14]. (For the linear case, see also [3]). Therefore, to prove that inequality (2.2) holds, it suffices to prove it with M replaced by M Dt , and in fact it suffices to prove it for M d = M D 0 , since the same proof holds for any dyadic grid D t with different constants, where the difference only depends on t. (Below we will remark on where this difference arises.) Second, we may assume that f, g are non-negative, bounded functions with compact support. It is clear from the definition of M d that we may take them non-negative. To show the approximation, it suffices to note that given f 1 , f 2 , there exists a sequence of nonnegative, bounded functions of compact support, g k , h k , that increase pointwise to f and g and such that lim In the linear case this is proved in [5,Lemma 3.30] and the same proof (with the obvious changes) works in the bilinear case. The desired result then follows by Lemma 3.4. Third, we restate the desired inequality in an equivalent fashion. Given an exponent p(·) ∈ P 0 and a weight v, define L p(·) v to be the quasi-Banach function space with norm In other words, L p(·) v is defined exactly as L p(·) with Lebesgue measure replaced by the measure v dx. This norm has many of the same basic properties as the L p(·) norm.
Hence, it will suffice to prove that since if we replace f l by f l /σ l , l = 1, 2, we get (2.2).
Finally, by homogeneity we may assume without loss of generality that f l L p l (·) σ l = 1, l = 1, 2, which by Lemma 3.5 (which holds in this setting) implies that Thus it will suffice to prove that which, again by Lemma 3.5, is equivalent to proving that with a constant independent of f l , l = 1, 2.
Then we can write We will estimate each term on the right separately. The integral I 1 is the "local" term and the estimate will use the LH 0 condition. The integral I 4 is the "global" term and the estimate will use the LH ∞ condition. The estimates of I 2 and I 3 involve both local and global estimates and are the most complicated: this is where our proof diverges most significantly from the linear case. Note, however, that the estimates for these integrals are the same (making the obvious changes) so we will only estimate I 2 .
The estimate for I 1 : We begin by forming the bilinear Calderón-Zygmund cubes associated with M d (h 1 σ 1 , h 3 σ 2 ). For the details of this decomposition, see [21]. Fix a > 2 2n and for each k ∈ Z define Then Ω k = j Q k j where {Q k j } k,j is a family of maximal dyadic cubes contained in Ω k with the property that a k < h 1 σ 1 Q k j h 2 σ 2 Q k j ≤ a k+1 . Moreover, since Ω k+1 ⊂ Ω k , the sets E k j = Q k j \ Ω k+1 are pairwise disjoint and there exists 0 < α < 1 such that (6.3) α|Q k j | < |E k j |. By Corollary 4.9, u and σ l , l = 1, 2, are A ∞ weights, so by Lemma 3.1 there exists 0 < β < 1 such that . We will use this fact repeatedly throughout the proof without further comment.
We can now estimate I 1 as follows: Since h 1 (x) 1 or h 1 (x) = 0, we have that The same estimate holds for h 3 . For each j, k define , and note that for x ∈ Q k j , q(Q k j ) ≤ p − (Q k j ) ≤ p(x). Thus, By Hölder's inequality with measure σ l dx, for l = 1, 3, .
Further, we claim that If we assume this for the moment, then we can argue as follows: since , by (6.5) and Young's inequality, Therefore, to complete the estimate of I 1 , we will prove (6.6). First, rewrite the left-hand side as follows: By the A p(·) condition we have that there is a constant c such that which by Lemma 3.5 implies that Hence, to prove (6.6) it will suffice to show that for l = 1, 2, and (6.10)
This completes the estimate of I 1 .
The estimate for I 2 : We first form the bilinear Calderón-Zygmund cubes associated with M d (h 1 σ 1 , h 4 σ 2 ) and we use the same notation as we did in the estimate for I 1 . To estimate this term I 2 we need to divide the cubes Q k j into three sets: small cubes close to origin, large cubes close to the origin, and cubes (of all sizes) far from the origin. To make this precise, let {P i } 2 n i=1 be the 2 n dyadic cubes adjacent to origin, |P i | ≥ 1, that are so large that if Q is any dyadic cube equal to or adjacent to P i in the same quadrant, and |P i | = |Q|, then, u(Q) ≥ 1 and σ l (Q) ≥ 1, l = 1, 2. The existence of such cubes follows from Lemma 3.1 and Corollary 4.9. Let P = i P i . We can then partition the cubes {Q k j } into three disjoint sets: We now estimate I 2 , arguing as we did at the beginning of the estimate for I 1 : We will estimate each sum in turn.

Remark 6.2.
Throughout the rest of this proof, we will allow the implicit constants to depend on σ l (P ) or u(P ). This is the only place in the proof where the constant depends on the dyadic grid we are working with.
The estimate for J 1 : Since h 4 ≤ 1 and p + < ∞, we have that by inequalities (6.4) and (6.5), Let q − be defined as in (4.3). By (6.6) we can estimate the integral: Therefore, by Young's inequality and by Lemma 3.2, The estimate for J 2 : Given (k, j) ∈ G , since P i ⊂ Q k j , we have that 1 ≤ σ 2 (P i ) ≤ σ 2 (Q k j ). Therefore, by Lemma 4.3 applied twice to w We can now estimate J 2 . By inequality (6.4) and Lemmas 3.13 and 4.6, there exists t > 1 such that We estimate each term in the product separately. First, we claim that Since σ l (Q k j ), u(Q k j ) ≥ 1, by Lemma 4.3 (applied several times) and the definition of A p(·) , we have that .
This proves (6.11). Second, by Lemma 3.6 and again by Lemma 4.3 we have that

1.
We can now continue the estimate of J 2 . Since by (6.11) and Young's inequality, by Lemmas 3.13 and 4.6 there exists t > 1 such that, by Lemma 3.2 applied twice, Finally, we again apply Lemmas 3.13 and 4.6 to get which completes the estimate for J 2 .
The estimate for J 3 : If Q k j is such that (k, j) ∈ H , then Q k j does not contain the origin. Since it is a dyadic cube, we have that dist(Q k j , 0) ≥ ℓ(Q k j ). Therefore, there exists a constant R > 1 depending only on n such that (6.14) sup By the continuity of p(·), there exists x + in the closure of Q k j such that p + (Q k j ) = p(x + ). Hence, since p(·) ∈ LH, for all x ∈ Q k j , by (6.14), .
In the same way, for l = 1, 2 we have that p l (·) satisfies (6.16) .
To estimate J 3 we need to divide H into two subsets depending on the size of the cubes Q k j with respect to σ 2 : We first estimate the sum over H 1 . By (6.15) and Lemmas 3.13 and 4.6, By Lemma 3.11, (6.4), and (6.6), and since h 1 ≥ 1, h 4 ≤ 1 and σ 2 (Q k j ) ≤ 1, by Hölder's inequality and Young's inequality, The proof that K 1 is bounded is exactly the same as the final estimate for I 1 , beginning at (6.7). Therefore, to complete the estimate for the sum over H 1 , we need to bound K 2 . By Lemmas 3.13 and 4.6 (applied twice) and by Lemma 3.2, To estimate the sum over H 2 , first note that by Lemma 3.6 we have that ; similarly, we have that We now divide the cubes in H 2 into two subsets depending on the size of σ 1 (Q k j ): We first estimate the sum over H 2a . Given (6.17) and (6.18), by Lemma 3.13, We first estimate L 2 . Since σ 2 (E k j ) σ 2 (Q k j ) 1, by (6.14), (6.8) and Lemma 4.6, In order to estimate L 1 we first note that for l = 1, 2, since σ l (Q k j ) ≥ 1, by Lemma 4.3, Given this estimate, by (6.8) and Young's inequality we have that The estimate of the last term is identical to the estimate for J 2 above, beginning at inequality (6.13); here we use the fact that σ 1 (Q k j ) ≥ 1 to get (6.12). The estimate over H 2b is similar, but we must replace the exponent p ∞ with r(Q k j ), which is defined by Then by (6.16), for x ∈ Q k j , .
We can then argue as we did for the sum over H 2a above to get The estimate for M 2 is identical to the estimate for L 2 . To estimate M 1 , we again use (6.19) for σ 2 , replacing p ∞ with r(Q k j ). Because σ 1 (Q k j ) < 1 we need to replace (6.19) with a different estimate. Since (p ′ 1 ) − = [(p 1 ) + ] ′ , by Lemma 3.5, Again by Lemma 3.5, and then by Lemma 3.3 and Lemma 4.5, Hence, We can now modify the estimate for L 1 to estimate M 1 : The estimate for the second term in the last line is the same as the final estimate for J 2 ; we use the same argument above to estimate L 1 . The estimate for the first term is the same as the estimate for K 1 above, noting that since h 1 ≥ 1 and by Hölder's inequality, This completes the estimate of M 1 and so of I 2 . Remark 6.3. As noted above, the argument for I 3 is the same as that for I 2 , replacing h 1 σ 1 with h 3 σ 2 and h 4 σ 2 with h 2 σ 1 .

The estimate for I 4 :
The estimate for I 4 parallels that for I 2 ; in particular we will decompose I 4 into essentially the same parts as we did above. For some parts the estimate is very similar to the corresponding part I 2 , and so we give the key inequalities but will omit some of the details. For other parts we will need to modify the argument and we will present these in more detail.
Begin by forming the bilinear Calderón-Zygmund cubes associated with M d (h 2 σ 1 , h 4 σ 2 ). We then decompose the collection of these cubes into the sets F , G and H , defined as above. Denote the sums over these sets by N 1 , N 2 and N 3 .
The estimate for N 1 : The estimate for N 1 is very similar to that for J 1 above. We replace the arguments used for the h 1 term and estimate the h 2 term and the h 4 term in the same way, using the fact that h 2 , h 4 ≤ 1: The estimate for N 2 : To estimate N 2 we modify the argument for J 2 . By the definition of A p(·) and by Lemma 3.6 we have that since u(Q) ≥ u(P i ) ≥ 1, by Lemma 3.5, Therefore, by Lemma 3.13, By Lemma 4.6, the second term on the last line is bounded by a constant 1. We estimate the first term using (6.11): By Young's inequality and Lemmas 3.2, 3.13, and 4.6, The estimate for N 3 : The estimate for N 3 is broadly similar to the estimate for J 3 above, but it differs considerably in the details. We first begin by dividing the cubes in H into the sets H 1 and H 2 as before. However, we now have to subdivide both of these sets and not just H 2 . Define The estimate for the sum over H 1a is similar to the estimate over H 1 above for J 3 , but we use the fact that both h 2 , h 4 ≤ 1. By Lemmas 3.13 and 4.6, Since h 2 , h 4 ≤ 1 and σ l (Q k j ) ≤ 1, l = 1, 2, by (6.6), by Young's inequality, Both of the final terms are estimated as K 2 above.
To estimate the sum over H 1b , we first define the exponent s(Q k j ) by 1 s(Q k j ) .
Then, arguing as we did for (6.15), we get that for x ∈ Q k j , .
Given this, by (6.17) (for h 2 instead of h 1 ), (6.18) and Lemma 3.13, The estimate for R 2 is identical to the estimate for L 2 . To estimate R 1 , we again use (6.19) for σ 1 , replacing p ∞ with s(Q k j ). Because σ 2 (Q k j ) < 1 we use a different estimate. Since (p ′ 2 ) − = [(p 2 ) + ] ′ , by Lemma 3.5, We can now argue as in the estimate of L 1 to get The estimate for the first term in the last line is the same as the estimate for the h 4 term in J 2 . Arguing as we did for (6.15) and (6.16), .
Then, since h 4 σ 1 ,Q k j ≤ 1, the estimate for the second term follows by (6.16), and by Lemmas 3.13, 3.2 and 4.6: Again, we estimate this last sum as in the final estimate for J 2 .
To estimate the sum over H 2 we argue as we did before for J 3 , dividing it into sums over H 2a and H 2b . The estimate over H 2a is identical to the estimate of the over this as before, replacing h 1 by h 2 . This yields terms just like L 1 and L 2 above. The estimate for the L 2 term is the same, as is the estimate for the L 1 term, except that in the final line the h 2 term is estimated like the h 4 term since both h 2 , h 4 ≤ 1.
To estimate the sum over H 2b , we can argue as before, getting terms like M 1 and M 2 , replacing h 1 by h 2 . The estimate of the M 2 term is again the same. To estimate the M 1 term we argue as before except we replace the exponent r(Q k j ) by p ∞ . But then the final line of the estimate becomes Theorem 2.8 follows almost directly from Theorem 2.4. To prove it, we will need two estimates for the Fefferman-Stein sharp maximal operator and an extrapolation theorem in the scale of weighted variable Lebesgue spaces. We first recall the definition of the sharp maximal operator: given f ∈ L 1 loc , let where the supremum is taken over all cubes Q. For δ > 0, define M # δ f (x) = M # (|f | δ )(x) 1 δ . The first estimate relates the norm of f and M # . For a proof, see Journé [19] or [8].
The second estimate is a pointwise inequality proved in [21]. To apply these results we need to extend Proposition 7.1 to the scale of variable Lebesgue spaces. The following result was proved in [13,Theorem 2.25]. The hypotheses are somewhat technical, but they are the right generalization to prove A ∞ extrapolation [8] in this setting. The result is stated in the abstract language of extrapolation pairs; for more on this approach to Rubio de Francia extrapolation, see [9]. Proposition 7.3. Suppose for some 0 < p < ∞ and every w 0 ∈ A ∞ , for every pair of functions (f, g) in a family F such that f L p (w) < ∞. Given p(·) ∈ P 0 , suppose there exists s ≤ p − such that w s ∈ A p(·)/s and the maximal operator is bounded on L (p(·)/s) ′ (w −s ). Then for (f, g) ∈ F such that f L p(·) (w) < ∞, f L p(·) (w) g L p(·) (w) .
If we take the limit as N → ∞, then by Fatou's lemma and Theorem 2.4, The desired conclusion now follows by a standard approximation argument since L ∞ c is dense in L p j (·) (w j ), j = 1, 2 [