ON THE BEST CHOICE OF A DAMPING SEQUENCE IN ITERATIVE OPTIMIZATION METHODS

f([xt-1,yt]) f(X) <e(f(xt-1) f(X)) Some iterative methods of mathematical programming use a damping sequence {cat} such that 0 _< at < 1 for all t, at 0 as t oo, and at = oo . For example, at = 1l(t + 1) in Brown's method for solving matrix games. In this paper, for a model class of iterative methods, the convergente rate for any damping sequence {cet } depending only on time t is computed . This computation is used to find the best damping sequence .

Some iterative methods of mathematical programming use a damping sequence {cat} such that 0 _< at < 1 for all t, at -0 as too, and at = oo .For example, at = 1l(t + 1) in Brown's method for solving matrix games.In this paper, for a model class of iterative methods, the convergente rate for any damping sequence {cet } depending only on time t is computed .This computation is used to find the best damping sequence .
Let L be a real affine space (so L with an origin fixed is the same as a real vector space) .For any points x and y in L, let [x, y] denote the closed interval with the ends x and y.For any real-valued function f on a subset X of L, let f(X) denote its infimum on X .
On a non-empty subset X of L, we consider an iterative procedure of the form where 0 < at < 1, 1 < t < T + 1, and [yt , xt_1] C X.
Here the total number T of iterations is either finite or infinite (T = oo); in the second case t runs over all natural numbers .
The objective of the procedure, starting at a point xo of X, is to minimize a convex bounded from below function f on X. (We call f convex on X, if its restriction on every interval contained in X is convex) .
To reach this objective, at each step t, one tends to choose yt in X, so that f decreases when one starts to move from xt_1 to yt .The choice of yt depends, in general, on f, x t_1, and t.We abstract ourselves from any concrete rule of choosing yt , and just assume that the choice was good enough .Namely, we fix a number 0 in the interval 0 <_ 0 < 1 and consider the class of iterative methods such that for all integers t in the interval 1 < t < T + 1 .
Note that according to (1), after a good direction yt -x t _ 1 is chosen, we do note minimize f on the interval [xt-1, yt], but -make a step from x t _ i in direction to y t , with a "stepsize" a t depending only on t .Iterátive procedures of the form (1) can be used not only for minimization of convex functions (see, for example, [4]) .Sometimes they can be used for minimization of a not necessary convex bounded from below function g on X, because, for an arbitrary g, its infimum g(X) is equal to f (X), where f is the largest convex function on X such that f < g everywhere on X .This f exists for any g, because the supremum of any set of convex functions on an interval is convex .This approach is feasible, if directions satisfying the condition (2) can be easily chosen .
Also the procedures of the form (1) can be used to search for a convex subset X,,,, of X.For example, this X,, coilld be a point where a function on X reaches a critical value .The search for X., can be reduced to minimization of a convex function f as follows .Pick a distance p on L invariant under all translations and such that p(x, x + (yx)a) = p(x, y)a for all x and y in L and all real numbers a >_ 0 .(So, when an origin 0 in L is fixed, (L, p(0, -)) is a linear normed space in the sense of Day .[3] .)Then f = p(X., .) is a convex non-negative function on L and X,,, consists of the points which minimize f .
The distance p(Y, Z) between two subsets of a metric space is defined to be the infimum of all p(y, z), where y E Y and z E Z .
Speaking of the convergente rate, minimization on the interval [xt_1, yt] under the condition (2) would give the exponential convergente Avoiding computation of stepsize (there is no line search in (1)), we will obtain (for the best damping sequence) a slower convergente One cannot get a better convergente assuming that X is convex, and f is a convex function defined on the whole L (see the remarks in Sections 2 and 3 below) .
Slow convergente of methods of the form (1) is sometimes compensated by their resistance to errors and data perturbations .The methods can be useful when data are uncertain and a precise solution is not feasible.See, for example, Belen'ky et al [1], where Robinson [5] result on the convergente of Brown's method [2] is generalized and applications to linear programming are given.

Convergence when a damping sequence is fixed
We fix the total number T >_ 1 (T is an integer or oo) of iterations, and a real number F. We impose the following condition on the function f and the procedure (1) : (4) f (yt) -f(X) < F for 0 < t < T + 1, where yo = xo .
When f is bounded from above (as well as from below) this condition holds autoxnatically for a suficiently large F. Theorem 1 .Fix 0 in the interval 0 <_ 0 < 1, T _< oc, F > 0, and a sequence {at} E [0, 1]T.Set do := 1 and Then for any L and X, any f convex on X, and any iterative procedure (1) satisfying the conditions ( 2) and ( 4), we have xt )-f(X)<Fdt, 0<t<T+1 .
Proof.Note that a "procedure (1)"is determined by a starting point xo and a sequence y t in X such that [y t , xt_1] C X, since the sequence {at} is fixed.We prove the first conclusion by induction on t.When t = 0, f (x o ) -f (X) _< Fdo by (4).
Remark.We could give a similar example with X in plane L, see Figure 2 .But the (T -}-1)-dimensional example above can be easily modified to an example with a convex X. Namely our function f on X can be extended to a convex function f' on the convex hull (here and below -y ranges over all non-negative functions on X taking only finitely many non-zero values and such that 1: 7(x) = 1, so E-y(x)x is a convex linear combination of points in X) of X as follows : At a small cost, an example with a convex f defined on the whole L can be constructed .Namely, for any s > 0 we can construct a convex function f =. fE on L and a procedure (1), ( 2), ( 4) such that f(x t ) -f (X) >_ F(d te) for all t.Indeed, let L and {yt}_1<t<T+1, {xt}o<t<T+1 be as above.We define f on the line R_ 1 = {(1 -a)y_1 + ayo : a real } as follows : f ((1 -a)y_1 + ayo ) = max(0, aF) .For any t in the interval 0 < t < T+1, we define a convex function f on the line Rt = {(1 -a)x t + ayt+1 : a real } as follows .
Proof.(a) We take a.ny a such that a > a t for all t > to .We have to prove that d. < a/(1 -9 + a9) .Let t > to .
We pick any a < a,,, and want to show that lim inf dt > a/(1 -8 + Ba) .
(d) Among those w in [x, y] where the function f reaches its minimal value, i .e. f(w) = f (X), we take the point x' closest to z.
Example .Let 0 = 0 in Theorem 1 .When ca t = 1/(t + 1) for all t, then d t = 1/(t+ 1) for all t .In the next section we will see that this {C¿t} is the best sequence when 0 = 0 .

The best damping sequence
Now we want to find the best damping sequence {at}, that is, the one whicll gives the minimal value for dT (when T is finite) in Theorem 1.The following theorem claims, among other things, the existente and uniqueness of such a sequence and its independence en F and T (when T dncreases, new members are added to the sequence, but old members stay the same) .Theorem 3.For a fixed 0 in the interval [0,1] leí us define inductively a sequence {At} = {At(9)} by A1 = 1/2 and At+1 = At(1 -At + 9A,)/(1 -(1 -20)A2 ) for t > 1 .
So, for any t > 1, we have: