In this blog, I will give a summary of the paper “Martin Zinkevich. Online convex programming and generalized infinitesimal gradient ascent. Proc. of ICML, 2003.” and some of my understandings. To the best of my knowledge, this is the first paper that introduces the online convex programming, which is a generalization of the well-studied experts problem. Besides, A simple and natural method is proposed, namely the generalized Infinitesimal Gradient Ascent (GIGA). This algorithm is straightforward but powerful with certain advantages. The paper first introduces the form of online convex programming followed by a Greedy Projection algorithm. Then GIGA is proposed in the context of repeated games, which is shown to be universally consistent.

1. Online Convex Programming

In the traditional convex optimization problem, we want to find a decision variable inside a constraint set such that cost is minimized. All the parameters of the problem is given beforehand. However, in an online convex programming problem, we do not know the cost function until the decision is made. Mathematically, given a feasible set \(F \subseteq \mathbb{R}^{n}\), at each time step \(t\), an onine convex programming algorithm selects a vector \(x^t \in F\), and then receives the cost function \(c^t\). Note that the algorithm does not know what the cost function \(c^t\) is before it makes the decision.

Because an online algorithm does not have the full information, it will probably produce a sequence of \(\{x^1,x^2,...\}\) that is not optimal. To measure the performance of an online algorithm, we compare it with the fixed optimal solution \(x\) generated by an algorithm which knows all the cost functions beforehand. Mathematically, we define the regret of algorithm \(A\) until time \(T\):

\[R_A(T) = \sum_{t=1}^T c^t(x^t) - \underset{x \in F}{\min} \sum_{t=1}^T C^t(x)\]

Note that the “best solution” from the hindsight is a fixed vector independent of time \(t\). Another possibility is to define regret against a dynamic strategy where the offline solution is allow a small number of change (c.f. section 2.2 of the paper). An online algorithm \(A\) is considered as a good algorithm if its average regret approaches zero, i.e. the regret \(R_A(T)\sim o(T)\). Next, I will give the basic algorithm called Greedy Projection which indeed has a sub-linear regret.

2. The Greedy Projection Method

The Greedy Projection method is simple and natural. At first, it selects an arbitrary starting point \(x^1\) inside the feasible set. At each iteration, it calculates the gradient of the cost function from the last step, performs a gradient descent with certain learning rate, and then projects the result back to the feasible set. Mathematically, it selects the vector \(x^{t+1}\) according to:

\[x^{t+1} = P(x^t-\eta_t(\nabla c^t(x^t)))\]

where \(\eta_t \in \mathbb{R}^{+}\) is the learning rate at each time step. The projection function

\[P(y)=\arg \underset{x\in F}{\min} \|x-y\|\]

finds the closest vector to \(y\) inside the feasible set. In the offline setting, this is a well-known projected gradient descent method, which enjoys performance for convex optimization. In the online setting, it turns out that the Greedy Projection method also has a good performance in the sense that the average regret approaches zero. To analyze the regret of the Greedy Projection method, we first make the assumptions that the feasible set \(F\) is bounded and there is an upper bound of the cost function gradients for all \(t\) and \(x\in F\). Formally, the diameter of the feasible set and the upper bound of the cost function gradients are given by

\[||F|| = \underset{x,y\in F}{\max} ||x-y||\] \[||\nabla c|| = \underset{x\in F, t\in \mathbb{N}}{sup}\; {||\nabla c^t(x)||}\]

A full list the technical assumptions can be found in the paper. According to theorem 1 in the paper, if we choose the learning rate to be \(\eta_t = t^{-1/2}\), then the regret of the Greedy Projection algorithm is upper bounded by the following inequality:

\[R_G(T) \leq \frac{\| F \|^2 \sqrt{T}}{2} + (\sqrt{T} - \frac{1}{2}) \|\nabla c \|^2\]

The regret upper bound contains two parts, the first part relates to the diameter of the feasible set, which accounts for the effect of our starting point \(x^1\). The second part relates to the cost function gradients because we use the gradient information at each time step. With this inequality in hand, we can make the conclusion that the regret of Greedy Projection method is indeed sub-linear, and the average regret goes to zero as the time goes to infinity.

3. Generalized Infinitesimal Gradient Ascent

Generalized Infinitesimal Gradient Ascent (GIGA) is a nontrivial extension of infinitesimal gradient ascent to games with more than two actions. The paper formulates the repeated games as online linear programming problems and shows that GIGA is universally consistent, in the sense that for sufficiently large \(T\), GIGA is very unlikely to have a large average regret.

In a repeated game, a player has a action set \(A\) and the environment has a action set \(Y\). The player wants to maximize the utility function \(u: A\times Y\rightarrow \mathbb{R}\) , without the knowledge that how the environment will act in the current round. For simplicity, we consider the case where the action set \(A=\{1,...,n\}\) . At each round \(t\), GIGA will generate a vector \(x^t \in F\), where the feasible set is given by:

\[F=\{ x\in \mathbb{R^n} | \sum_{i=1}^n x_i = 1, \; \forall i, x_i \geq 0 \}\]

Based on \(x^t\), the player will play action \(i\) with probability \(x_i^t\). Besides, we assume that the environment is oblivious and deterministic in the sense that the environment will play the same sequence of actions regardless of the actions of the player. This means that the randomness only comes from the strategy of the player. Now we are ready to give the form of Generalized Infinitesimal Gradient Ascent. After each round \(t\), we observes the action from the other player \(h_{t,2}\), the vector in the next round is given by:

\[y_i^{t+1} = x_i^t + \eta_t u(i,h_{t,2})\] \[x^{t+1} = P(y^{t+1})\]

Based on the result of Greedy Projection, the expected regret of GIGA for all oblivious deterministic environments is \(O(\sqrt{T})\) and implies that GIGA is universally consistent. Besides, in contrast to Greedy Projection, GIGA is a stochastic online algorithm so the regret result is taken over expectation.

4. Conclusion

In the following, the paper provides a naive extension of experts algorithm for online convex programming. And the work proves that this extension does not work well compare to GIGA. I skip this part since it is less important. In conclusion, this paper introduces a generalization of experts problem, namely online convex programming problem. Then the Greedy Projection method and GIGA are proposed for online convex programming and repeated games, respectively. GIGA is proven to be universally consistent and thus is a good online algorithm for repeated games.