de Farias & van Roy (2003) (The Linear Programming Approach to Approximate Dynamic Programming)

29 Mar, 2016 · Read in about 4 min · (643 Words)

This paper was published in Operations Research and as such they use a different notation and jargon than economists. I’ll present some of their main results, but in a language and notation that is familiar to me.

Background

Consider a typical dynamic programming problem faced by an economic agent, which we summarize by the following Bellman equation:

$$V(x) = \underset{c \in \Gamma(x)}{\max} u(x, c) + \beta E V(x')$$

Now define the natural operator associated with the Bellman, which we call T:

$$T v = \underset{c \in \Gamma(x)}{\max} u(x, c) + \beta E v(x')$$

For ease of presentation we will directly assume that T is a contraction mapping and that v belongs to the set of bounded and continuous functions.

Let v^* be the optimal value function. Under our assumptions v^* is the unique fixed point of Bellman’s equation.

Mathematical programs

Under our assumptions, the operator T is monotonic.

This means that for any v such that v ≥ Tv we have v ≥ v^*.

Also ∀vstv ≤ Tv, v ≤ v^*.

Typically this property of our contraction mapping is used to show that value function iteration will converge for any initial guess of v. Today we will use it in a slightly different way.

Let c be any vector of all positive elements. Consider the mathematical program defined by

$$\begin{aligned}&\underset{V}{\min} c'v \\ s.t. & \quad v \ge T v \end{aligned}$$

By the monotonicity of T it is easy to see that any v satisfying the constraint must be at least as big at v^*.

Then notice that the objective is to minimize the inner product between a strictly positive vector c and the choice v.

These two facts mean that the unique solution to the programming problem is v^*.

Linear program

I told you that we would use linear programming, but notice that the constraint is nonlinear because T is nonlinear.

However, if we stare at the definition of T for long enough we will notice that if we consider the constraint v ≥ Tv state by state, (e.g. v(x) ≥ Tv(x)) we will notice that we can replace the single non-linear constraint per state with a system of linear constraints for each state.

This system is defined by enumerating all feasible actions for each state and writing down the right hand side of Bellman’s equation for that state and control. The program looks as follows:

$$\begin{aligned}&\underset{V}{\min} c'v \\ s.t. & \quad v(x) \ge u(x,c) + \beta \sum_{x'} P(x'| x,c)v(x') \quad \forall x \in X \; \forall c \in \mathcal{A}_x \end{aligned}$$

Curse of dimensionality

This linear program has an S dimensional state with an S * A dimensional constraint matrix. When S or A are large, this problem can quickly become subject to the curse of dimensionality.

The authors of this paper propose approximate linear programming as a way to resolve this issue.

Specifically they choose to represent v as the product of a basis matrix and a vector of coefficients r.

They then write down a linear program

$$\begin{aligned}&\underset{V}{\min} c'\Phi r \\ s.t. & \quad \Phi r(x) \ge u(x,c) + \beta \sum_{x'} P(x'| x,c) \Phi r(x') \quad \forall x \in X \; \forall c \in \mathcal{A}_x \end{aligned}$$

Notice now that we have swapped out a vector of length S, for a vector with length equal to the number of columns in the basis matrix. This is something that we have control over, thus we can choose the “size” of this problem. Thus the objective is smaller, but the number of constraints is exactly the same.

However, if we choose each column of Φ to have finite support (i.e. we use splines), most constraints become inactive and the large constraint matrix becomes sparse.

The remainder of the paper, and its main contribution, is to bound the error we are subject to by solving the approximate linear program instead of the exact linear program.