Hull (2015) (Approximate dynamic programming with post-decision states as a solution method for dynamic economic models)
This paper presents a stochastic simulation method for solving dynamic economic models.
The ideas in this paper lean on a literature sometimes known as approximate dynamic programming and can enable us to solve models with many state variables and non-convexities in objectives and constraints.
My intent is to summarize the core theoretical ideas behind the algorithm.
Main idea: Post decision states
Notation: classic Bellman equation
Consider a stationary economic model where at time t the state is summarized by a vector st of endogenous state variables and a vector xt of exogenous state variables.
The optimization problem of an agent is often summarized by a Bellman equation of the form
V(st, xt) = maxctu(ct) + βE[V(st + 1, xt + 1)]
subject to
st + 1 = f(st, xt, ct), ct ∈ Γ(st, xt), xt + 1 = g(xt, ϵt + 1)
Post-decision state value function
Note that that the transition function for the endogenous state takes st, xt, and ct and emits st + 1. We can think of st + 1 being chosen at the end of period t, meaning after the controls ct have been decided.
In the standard (or pre-decision) Bellman equation, we have that the state at time t is the endogenous state defined at the end of period t − 1 and the exogenous state realized at the start of period t.
We will now consider a different representation of the state vector that couples the endogenous state defined at the end of period t − 1 with the exogenous state realized at the start of period t − 1. That is we will consider st and xt − 1, which is known as the post-decision state at time t − 1.
Let Vx(st, xt − 1) be the value of having post-decision state st, xt − 1 in period t − 1. This is the maximum expected, discounted utility an agent can achieve after controls have been selected in period t − 1.
Because ct − 1 is not chosen until after xt − 1 is realized, we know that Vx(st, xt − 1) is equal to the expectation of the maximum expected, discounted utility the agent will receive after xt arrives. That is, we can write
Vx(st, xt − 1) = E[V(st, xt)|st, xt − 1],
where V(st, xt) is the pre-decision Bellman equation.
It follows that we can write
V(st, xt) = maxct + βVx(st + 1, xt).
These equations can be manipulated to produce the recursive form of the post-decision state Bellman equation:
Vx(st, xt − 1) = E[maxctu(ct) + βVx(st + 1, xt)|st, xt − 1].
Notice that the expectation is outside the max operator, meaning that the maximization problem is deterministic.
Algorithm
Now that we have the post-decision state Bellman equation, the algorithm is fairly straightforward. I will present the algorithm from the paper in the context of Markov exogenous processes, but I believe it is incorrectly specified. I’ll discuss how I’d change it later.
- Setup
- Discretize endogenous state space
- Choose a simulation length T
- Choose initial endogenous and exogenous states
- Construct an initial guess for the value function at the discritized endogenous and exogenous states.
- Iterations
- Construct a time series of Exogenous states for t=1, 2, …, T
- For time t = 1, 2, ..., T perform the following 3 steps:
- Choose controls ct to maximize the term inside the expectation on the RHS of V(st, xt − 1). To do this we need to using the value function from the previous iteration for the future value function
- Compute the expectation implicitly by updating the guess of the value function using a convex combination of the previous iteration’s value function and the value computed above
- Using the chosen controls and realization of exogenous state, apply the endogenous transition equation to iterate the endogenous state forward one period
- Convergence:
- Check a convergence criterion that compares the discretized value function across multiple iterations.
- If converged, return the discretized value function and run a regression on the time series to obtain a policy function from the time series of controls
Comments on the algorithm
Here are a few comments about the algorithm:
- Because the expectation operator is outside the max operator, we don’t have to spend time computing expectations when solving the optimization problem in each period of the simulation. This speeds up computation quite a bit.
- Expectations are computed implicitly when we update our guess for the post-decision state value function
My compliant I think it is incorrect to re-generate a time series of exogenous states on each iteration. Doing so will not allow the algorithm to ever converge as the updated value function in iteration n is dependent on the randomness from the exogenous simulation in period n.
Using the same simulated time series for exogenous states in every iteration (as in other simulation algorithms in the literature) will allow this algorithm to converge; subject to a particular exogenous path. To ensure that the solution is accurate for the underlying data generating process and not just the simulation you use, make sure that the length of the time series is large.