DETERMINATION OF ACCELERATED FACTORS IN GRADIENT DESCENT ITERATIONS BASED ON TAYLOR ’ S SERIES

In this paper the efficiency of accelerated gradient descent methods regarding the way of determination of accelerated factor is considered. Due to the previous researches we assert that the use of Taylor’s series of posed gradient descent iteration in calculation of accelerated parameter gives better final results than some other choices. We give a comparative analysis of efficiency of several methods with different approaches in obtaining accelerated parameter. According to the achieved results of numerical experiments we make a conclusion about the one of the most optimal way in defining accelerated parameter in accelerated gradient descent schemes.


INTRODUCTION
We  analyze nonlinear optimization methods for the minimizat ion of an objective function f: n  →  : min ( ), In this paper we suppose that f is uniformly convex and twice continuously differentiab le function.For these classes of functions, furthermore we adopt the next often used representation of optimization models: (2) where x k+1 denotes the function value in the next iterative point, x k presents the function value in the current iteration, t k is the iterative step size value and d k is the search direction vector.Since the first methods for solving non-linear min imization problems have been developed it was clear that the two main characteristics of these iterations: the step length and the search direction (parameters t k and d k in (2)) crucially determine accelerated features of the optimization method.Therewith, we expect the value of an iterative step size to be optimal at the sense that it is not too high to misses out the minima and at the same time not too small to provide unnecessary high number of iterations.In order to obtain the minimal value of the objective function we assume that the search direction vector fulfills the descent condition: this paper we suppose that f is uniformly convex and: 0, T kk gd where with g k we denote a gradient of the objective function in the k − th iteration.* Corresponding author: milena.petrovic@pr.ac.rs

THEORETICAL PART
One of the first choices for descending search direction vector, which satisfies the previous condition, is proposed in classical gradient descent method (GD method).In this iteration d k = −g k .This is the one of the oldest iterations for solving nonlinear optimization problems.Theoretically, GD method has good convergence properties.In practical sense GD iterat ion is very slow and not really useful for the problems with large number of variables.In order to prevent these disadvantages but at the same time conserving the descending property of the negative gradient, some authors developed iterative gradient descent schemes more efficient in practical usage than GD method (Fletcher & Reeves , 1964;Polak & Rib iére, 1969;Polyak, 1969).
where the acceleration parameter θ k is calculated as θ k = a k /b k . Here The value of iterative step-length t k is derived using backtracking line search procedure.This iteration is noted as AGD-method.The AGD method is compared with gradient descent GD-method.In numerical experiments, obtained for 340 test problems, notably better results in favor to the AGD scheme are registered.The analyzed characteristics are the number of iterations, the CPU time and the number of function evaluations.Regarding all three tested properties, the AGD iteration has provided considerably reduction of measured values comparing to the GD scheme.
Considering the obtained results from Andrei (2006) in Stanimirovic & Miladinovic (2010) the authors identify a class of accelerated gradient descent methods.On their opin ion all gradient descent schemes with somehow defined accelerated factor belong to this class of methods.They have offered their way of deriving accelerated factor and described it in (Stanimirovic & M iladinovic, 2010).For that purpose they have constructed accelerated gradient descent SM-method as next: To define an accelerated factor, noted in (4) as γ k , the authors used a Taylor's expansion of the objective function f k+1 :   Hessian ∇ 2 f (ξ) in ( 5) is rep laced by ∇ 2 f (ξ) = γ k+1 I, which transforms expression (5) into: Fro m the previous relation arises the value of the accelerated parameter γ k+1 of the SM-method: Linear convergence for so defined SM iterat ion is proven as well as improvements in performances comparing to the GD and AGD-methods.Achieving a reduction in number of iterations, CPU time and in number of function evaluations in comparison with GD scheme was expected.But pro minently reduction of resulted values achieved by the SM-method in all three tested characteristics compered to the accelerated AGD iteration leads us to conclusion that defining an accelerated parameter using the Taylor's expansion gives better practical results.
Later on in (Petrovic & Stanimirovic 2014;Petrovic 2015;Stanimirovic et al., 2015) authors use a favorable results from Stanimirovic & Miladinovic (2010) and continue to explore the way of defining an accelerated parameter using the Taylo r's expansion and different forms of iterat ions for solving nonlinear unconstrained optimization problems.For this investigation we consider results from (Petrovic, 2015).In Petrovic (2015), an iteration with two step lengths, α k and β k is presented as: Using the Taylor's expansion applied on a scheme ( 9) and an approximation of Hessian in a current iterative point by the product γ k+1 I where an accelerated parameter γ k+1 is included, the value of accelerated factor of the ADSS-method in (k + 1)iterative point is: Like in Stanimirovic & Miladinovi (2010) we assume that γ k+1 > 0 otherwise the Second-Order Necessary Condition and Second-Order Sufficient Condit ion will not be fulfilled.
However, in the case when γ k+1 < 0 we take γ k+1 = 1.This way we ensure that when Gk is not positive definite matrix than taking γ k+1 = 1 produces that the search direction is −g k and that is a descent direction indeed.The next iterative point x k+2 is then calculated as: which presents an accelerated gradient descent iteration.In all three mentioned accelerated gradient descent models (AGD, SM, ADSS) the values of iterative step sizes are computed by the Armio's backtracking line search procedures which is generally described through the next three steps.Applying the SM and the ADSS methods linear convergence is proven on the set of uniform convex functions and under additional assumptions for the strictly convex quadratics as well.Algorith ms (0.2) and (0.3) display the SM and the ADSS accelerated models respectively.
Require: Objective function f (x), the direction of the search (d k ) at the point x k and numbers 0 < σ < 0.5 and β ∈ (0, 1).
Require: Objective function f(x) and chosen initial point x 0 ∈ dom( f ).1: Set k = 0 and compute f (x 0 ), g 0 = ∇f (x 0 ) and take γ 0 = 1.2: If test criteria are fulfilled then stop the iteration; otherwise, go to the next step.

NUMERICAL COMPARATIONS
In Stanimirovic & M iladinovic (2010) authors have tested 30 test functions fro m Andrei (2008) g iven in generalized or extended form as a large scale unconstrained test problems.They considered for each test function 10 different nume rical experiments with the number of variables: 100, 500, 1000, 2000, 3000, 5000, 7000, 8000, 10000 and 15000.A substantial outcome of these numerical experiments is that the SM method shows the best results for 20 test functions in the sense of number of iterations needed to achieve requested accuracy, while AGD is the best in the remaining 10 test problems.Considering the CPU time and the number of function evaluations, the SM method shows better performance for 23 test functions.
Since the both of the methods, the SM and the AGD, belong to the class of accelerated gradient descent methods we can conclude that computing an accelerated variable using the Taylor's expansion provides a far better practical results in reducing the number of iterations, the number of function evaluations and needed CPU time.That was the reason to continue with this way of computing the parameter of acceleration.The similar approaches are applied in (Petrovic & Stanimirovic, 2014;Petrovic, 2015;Stanimirovic et al., 2015).In this work we consider results published in Petrovic (2015).2015) confirm noticeably better performance of accelerated double step size ADSS method comparing to the accelerated gradient descent SM scheme with one step length parameter.This also lead us to conclusion that properly defined values of iterative step lengths (as well as the number of step sizes involved in method) substantially causes the level of efficiency of analyzed method.To confirm this ascertainment in this wo rk the SM and the ADSS schemes are numerically compared and the numerical outcomes from Pet rovic (2015) are used.First, let us point it out on the common features of these two iterations: -Co mputation of accelerated factor for each of these models is achieved in a similar way, using the Taylor's expansion and in accordance with the posed formu lation of the iteration; - The value of the single step size in the SM is obtained by the Backtracking line search technique as well as each of the value of two needed step length parameters in the ADSS scheme; -Both methods have a negative gradient for the search direction, i.e. both methods are gradient descent.For each of the 25 test functions ten experiments are taken for the larger number of variables: 1000,2000,3000,5000,7000,8000,10000,15000,20000

CONCLUS ION
In this paper the efficiency of an accelerated gradient descent method with accelerated parameter achieved using the properties of the Taylor's expansion is described and numerically proved.For that purpose we have pointed out on a different ways of defining a factor of accelerat ion.The results of the numerical tests in Stanimirovic & Milad inovic (2010) wh ich show the benefits of calculating the acceleration parameter trough the Taylor's expansion instead of means stated in Andrei (2006), were the reason to continue investigation of developing an accelerated parameter this way applied on a different form of a gradient descent iteration.Double step size gradient descent ADSS model proposed in Petrovic (2015), with all three crucial elements of iteration (an accelerated parameter, step sizes and search direction) similarly defined, is compared with the SM method.The efficiency of the ADSS model regarding all analyzed characteristics (number of iterations, CPU time and number of function evaluations) in comparison to the accelerated gradient descent single step size SM method has been numerically confirmed.
At the end we can indicate that the problem stated in this paper can be exploited in some new ways.More precisely, the problem of finding an accelerat ion parameter with so me different forms of gradient descent iterations is still actual.

Table 1 .
Su mmary o f nu merical results for SM and ADSS tested on 25 large scale test functions regarding number of iteration.

Table 2 .
Su mmary o f nu merical results for SM and ADSS tested on 25 large scale test functions regarding CPU time.

Table 3 .
and 30000.Analy zed characteristics are number of iterations, needed CPU t ime o f execution and the number of function evaluations.Taking the next exit criteria : Su mmary o f nu merical results for SM and ADSS tested on 25 large scale test functions regarding number of evaluation.. ADSS outperforms the SM respect to all tested characteristics: number of iterations, CPU time and number of function evaluations.Considering the number of iterat ions, the ADSS exceeds about 70 t imes.Needed CPU t ime is averagely 113 times less in favor to ADSS comparing to SM and regarding function evaluations about 80 times lower nu mber is needed with regard to the SM.

Table 4 .
Average nu merical outcomes for 25 test functions tried out on 10 numerical experiments in each iterat ion..