Hull (2015) (Approximate dynamic programming with post-decision states as a solution method for dynamic economic models)

22 Mar, 2016 · Read in about 5 min · (853 Words)

This paper presents a stochastic simulation method for solving dynamic economic models.

The ideas in this paper lean on a literature sometimes known as approximate dynamic programming and can enable us to solve models with many state variables and non-convexities in objectives and constraints.

My intent is to summarize the core theoretical ideas behind the algorithm.

Main idea: Post decision states

Notation: classic Bellman equation

Consider a stationary economic model where at time t the state is summarized by a vector s_t of endogenous state variables and a vector x_t of exogenous state variables.

The optimization problem of an agent is often summarized by a Bellman equation of the form

V(s_t, x_t) = max_{c_t}u(c_t) + βE[V(s_t + 1, x_t + 1)]

subject to

s_t + 1 = f(s_t, x_t, c_t), c_t ∈ Γ(s_t, x_t), x_t + 1 = g(x_t, ϵ_t + 1)

Post-decision state value function

Note that that the transition function for the endogenous state takes s_t, x_t, and c_t and emits s_t + 1. We can think of s_t + 1 being chosen at the end of period t, meaning after the controls c_t have been decided.

In the standard (or pre-decision) Bellman equation, we have that the state at time t is the endogenous state defined at the end of period t − 1 and the exogenous state realized at the start of period t.

We will now consider a different representation of the state vector that couples the endogenous state defined at the end of period t − 1 with the exogenous state realized at the start of period t − 1. That is we will consider s_t and x_t − 1, which is known as the post-decision state at time t − 1.

Let V^x(s_t, x_t − 1) be the value of having post-decision state s_t, x_t − 1 in period t − 1. This is the maximum expected, discounted utility an agent can achieve after controls have been selected in period t − 1.

Because c_t − 1 is not chosen until after x_t − 1 is realized, we know that V^x(s_t, x_t − 1) is equal to the expectation of the maximum expected, discounted utility the agent will receive after x_t arrives. That is, we can write

V^x(s_t, x_t − 1) = E[V(s_t, x_t)|s_t, x_t − 1],

where V(s_t, x_t) is the pre-decision Bellman equation.

It follows that we can write

V(s_t, x_t) = max_{c_t} + βV^x(s_t + 1, x_t).

These equations can be manipulated to produce the recursive form of the post-decision state Bellman equation:

V^x(s_t, x_t − 1) = E[max_{c_t}u(c_t) + βV^x(s_t + 1, x_t)|s_t, x_t − 1].

Notice that the expectation is outside the max operator, meaning that the maximization problem is deterministic.

Algorithm

Now that we have the post-decision state Bellman equation, the algorithm is fairly straightforward. I will present the algorithm from the paper in the context of Markov exogenous processes, but I believe it is incorrectly specified. I’ll discuss how I’d change it later.

Setup
- Discretize endogenous state space
- Choose a simulation length T
- Choose initial endogenous and exogenous states
- Construct an initial guess for the value function at the discritized endogenous and exogenous states.
Iterations
- Construct a time series of Exogenous states for t=1, 2, …, T
- For time t = 1, 2, ..., T perform the following 3 steps:
  1. Choose controls c_t to maximize the term inside the expectation on the RHS of V(s_t, xt − 1). To do this we need to using the value function from the previous iteration for the future value function
  2. Compute the expectation implicitly by updating the guess of the value function using a convex combination of the previous iteration’s value function and the value computed above
  3. Using the chosen controls and realization of exogenous state, apply the endogenous transition equation to iterate the endogenous state forward one period
Convergence:
- Check a convergence criterion that compares the discretized value function across multiple iterations.
- If converged, return the discretized value function and run a regression on the time series to obtain a policy function from the time series of controls

Comments on the algorithm

Here are a few comments about the algorithm:

Because the expectation operator is outside the max operator, we don’t have to spend time computing expectations when solving the optimization problem in each period of the simulation. This speeds up computation quite a bit.
Expectations are computed implicitly when we update our guess for the post-decision state value function

My compliant I think it is incorrect to re-generate a time series of exogenous states on each iteration. Doing so will not allow the algorithm to ever converge as the updated value function in iteration n is dependent on the randomness from the exogenous simulation in period n.

Using the same simulated time series for exogenous states in every iteration (as in other simulation algorithms in the literature) will allow this algorithm to converge; subject to a particular exogenous path. To ensure that the solution is accurate for the underlying data generating process and not just the simulation you use, make sure that the length of the time series is large.