Handbook of Econometrics Vols1-5 _ Chapter 51

pdf
Số trang Handbook of Econometrics Vols1-5 _ Chapter 51 63 Cỡ tệp Handbook of Econometrics Vols1-5 _ Chapter 51 4 MB Lượt tải Handbook of Econometrics Vols1-5 _ Chapter 51 0 Lượt đọc Handbook of Econometrics Vols1-5 _ Chapter 51 0
Đánh giá Handbook of Econometrics Vols1-5 _ Chapter 51
4.8 ( 10 lượt)
Nhấn vào bên dưới để tải tài liệu
Đang xem trước 10 trên tổng 63 trang, để tải xuống xem đầy đủ hãy nhấn vào bên trên
Chủ đề liên quan

Nội dung

Chapter 51 STRUCTURAL ESTIMATION DECISION PROCESSES* JOHN OF MARKOV RUST University of Wisconsin Contents 1. 2. Introduction Solving MDP’s via dynamic 2.1. Finite-horizon decision 4. and the optimality 2.3. Bellman’s equation, 2.4. A geometric Overview dynamic series representation of Markovian Alternative 3.2. Maximum 3.3. Alternative and Bellman’s equation mappings and optimality 3095 for discrete decision processes estimation estimation 3.4. Alternative 3.5. The identification estimation methods: methods: 3101 of DDP’s Finite-horizon Infinite-horizon DDP problems DDP’s problem Optimal replacement 4.2. Optimal retirement 3118 3123 3125 3130 applications 4.1. 3099 3100 models of the “error term” likelihood 3091 3091 3094 for MDP’s methods methods 3.1. programming contraction of solution Econometric 3082 3088 3089 Infinite-horizon Empirical programming A brief review rules 2.2. 2.5. 3. dynamic programming: of bus engines from a firm References 3130 3134 3139 *This is an abridged version of a monograph, Stochastic Decision Processes: Theory, Computation, and Estimation written for the Leif Johansen lectures at the University of Oslo in the fall of 1991. I am grateful for generous financial support from the Central Bank of Norway and the University of Oslo and comments from John Dagsvik, Peter Frenger and Steinar Stram. Handbook of Econometrics. Volume IV, Edited by R.F. Engle and D.L. McFadden 0 1994 Elsevier Science B. V. All rights reserved John Rust 3082 1. Introduction Markov decision processes (MDP) provide a broad framework for modelling sequential decision making under uncertainty. MDP’s have two sorts of variables: state variables s, and control variables d,, both of which are indexed by time or agent t=0,1,2,3 ,..., T, where the horizon T may be infinity. A decision-maker can be represented by a set of primitives (u, p, b) where u(s,, d,) is a utility function representing the agent’s preferences at time t, p(s,+ 1(s,, d,) is a Markov transition probability representing the agent’s subjective beliefs about uncertain future states, and /3~(0,1) is the rate at which the agent discounts utility in future periods. Agents are assumed to be rational: they behave according to an optimal decision rule d, = 6(s,) that solves V,‘(s) = maxd E, { CT= o Fu(s,, d,)) s,, = s} where Ed denotes expectation with respect to the controlled stochastic process {st,d,) induced by the decision rule 6. The method of dynamic programming provides a constructive procedure for computing 6 using the valuefunction V,’ as a “shadow price” to decentralize a complicated stochastic/multiperiod optimization problem into a sequence of simpler deterministic/static optimization problems. MDP’s have been extensively used in theoretical studies because the framework is rich enough to model most economic problems involving choices made over time and under uncertainty.’ A pplications include the pioneering work on optimal inventory policy by Arrow et al. (1951), investment under uncertainty [Lucas and Prescott (1971)] optimal intertemporal consumption/savings and portfolio selection under uncertainty [Phelps (1962), Hakansson (1970), Levhari and Srinivasan (1969), Merton (1969) and Samuelson (1969)], optimal growth under uncertainty [Brock and Mirman (1972), Leland (1974)], models of asset pricing [Lucas (1978), Brock (1982)], and models of equilibrium business cycles [Kydland and Prescott (1982), Long and Plosser (1983)]. By the early 1980’s the use of MDP’s had become widespread in both micro- and macroeconomic theory as well as in finance and operations research. In addition to providing a normative theory of how rational agents “should” behave, econometricians soon realized that MDP’s might provide good empirical models of how real-world decision-makers actually behave. Most data sets take the form {dT,sy) where d; is the decision and s; is the state of an agent a at time t.2 Reduced-form estimation methods can be viewed as uncovering agents’ decision ‘Stochastic control theory can also be used to model “learning” behavior in which beliefs about unobserved stae variables and unknown parameters of the transition according to the Bayes rule. ‘In time-series data, a is fixed at 1 arid t ranges over 1,. , T. In cross-sectional data at 1 and a ranges over 1,. ., A. In panel data sets, t ranges over 1,. ., T.. where T, is periods agent a is observed (possibly different for each agent) and a ranges over 1,. the total number of agents in the sample. agents update probabilities sets, T is fixed the number of , A where A is Ch. 51: Structural Estimation ofMarkov 3083 Decision Processes rules or, more generally, the stochastic process from which the realizations {df, $‘} were “drawn”, but are generally independent of any particular behavioral theory.3 This chapter focuses on structural estimation of MDP’s under the maintained hypothesis that {d;, ST}is a realization of a controlled stochastic process. In addition to uncovering the form of this stochastic process (and the associated decision rule 6), structural methods attempt to uncover (estimate) the primitives (u,p,B) that generated it. Before considering whether it is technically possible to estimate agents’ preferences and beliefs, we need to consider whether this is even logically possible, i.e. whether (u,p,fi) is identified. I discuss the identification problem in Section 3.5, and show that the question of identification depends on what type of data we have access to (i.e. experimental vs. no&experimental), and what kinds of a priori restrictions we are willing to impose on (u, p, p). If we only have access to non-experimental data (i.e. uncontrolled observations of agents “in the wild”), and if we are unwilling to impose any prior restrictions on (u, p, j?) beyond basic measurability and regularity conditions on u and p, then it is impossible to consistently estimate (u, p, b), i.e. the class of all MDP’s is non-parametrically unidentified. On the other hand, if we are willing to restrict u and p to a finite-dimensional parametric family, say {U= u,, p = psi 13~0 c RK}, then the primitives (u,p, /?) are identified (generically). If we are willing to impose an even stronger prior restriction, stationarity and rational expectations (RE), then we only need parametric restrictions on u in order to identify (u,p,fl) since stationarity and the RE hypothesis allow us to use non-parametric methods to consistently estimate agents’ subjective beliefs from observations of their past states and decisions. Given that we are already imposing strong prior assumptions by modelling agents’ behavior as an optimal decision rule to an MDP, it would be somewhat schizophrenic to be unwilling to impose any additional prior restrictions on (u, p, /3). In the sequel, I assume that the econometrician is willing to bring to bear prior knowledge in the form of a parametric representation for (u, p, /?).This reduces the problem of structural estimation to the technical issue of estimating a parameter vector BE0 where 0 is a compact subset of RK. The appropriate econometric method for estimating 8 depends critically on whether the control variable d, is continuous or discrete. If d, can take on a continuum of possible values we say that the MDP is a continuous decision process (CDP), and if d, can take on a finite or countable number of values then the MDP is a discrete decision process (DDP). The predominant estimation method for CDP’s is generalized method of moments (GMM) using the first order conditions from the MDP problem (stochastic Euler equations) as orthogonality conditions [Hansen (1982), Hansen and Singleton (1982)]. Hansen’s chapter (this volume) and Pakes’s (1994) survey provide excellent introductions to the literature on structural estimation methods for CDP’s. ‘For Lancaster an overview of this literature,see Billingsley(1961), Chamberlain (1990) and Basawa and Prakasa Rao (1980). (1984), Heckman (1981a), 3084 John Rust Thus chapter focuses on structural estimation of DDP’s. DDP’s are appropriate for decision problems such as whether not to quit a job [Gotz and McCall (1984)], search for a new job [Miller (1984)], have a child [Wolpin (1984)-J, renew a patent [Pakes (1986)], replace a bus or airplane engine [Rust (1987), Kennet (1994)] or retire a cement kiln [Das (1992)]. Although most of the early empirical applications of DDP’s have been for binary decision problems, this chapter shows that most of the estimation methods extend naturally to DDP’s with any finite number of possible decisions. Examples of multiple choice DDP’s include Rust’s (1989, 1993) model of retirement behavior where workers decide each period whether to work full-time, work part-time, or quit, and whether or not to apply for Social Security, and Miller’s (1984) multi-armed-bandit model of occupation choice. Since the control variable in a DDP model assumes at most a finite number of possible values, the optimal decision rule is determined by the solution to a system of inequalities rather than as a zero to a first order condition. As a result there is no analog of stochastic Euler equations to serve as orthogonality conditions for GMM estimation of 8 as in the case of CDP’s. Instead, most structural estimation methods for DDP’s require explicit calculation of the optimal decision rule 6, typically via numerical methods since analytic solutions for 6 are quite rare. Although we also discuss simulation estimators that rely on Monte Carlo simulations of the controlled stochastic process {st, d,} rather than on explicit numerical calculation of 6, all of these methods can be conceptualized as forms of nonlinear regression that search for an estimate 8 whose implied decision rule d, = 6(s,, 6) “best fits” the data {d:, s;} according to some metric. Unfortunately straightforward application of nonlinear regression methods is not possible due to three complications: (1) the “dependent variable” d, is discrete rather than continuous; (2) the functional form of 6 is generally not known a priori but rather must be derived from the solution to the stochastic control problem; (3) the “error term” E, in the “regression function” 6 is typically multi-dimensional and enters in a non-additive, non-separable fashion: d, = 6(x,, s,, 0). The basic motivation for including an error term in the DDP model is to obtain a “statistically non-degenerate” econometric model. The degeneracy of DDP models without error terms is due to a basic result of MDP theory reviewed in Section 2: the optimal decision rule 6 is a deterministic function of the state s,. Section 3.1 offers several possible interpretations for the error terms in a DDP model, but argues that the most natural and internally consistent interpretation is that E, is an unobserved state variable. Under this interpretation, we partition the full state variable s, = (x,, E,) into a subvector x, that is observed by the econometrician, and a subvector E, that is observed only by the agent. If we are willing to impose two additional restrictions on u and p, namely, that E, enters u in an additive separable (AS) fashion and that p satisfies a conditional independence (CI) condition, we can apply a number of powerful results from the literature on estimation of static discrete choice models [McFadden (1981, 1984)] to yield estimators of 0 with desirable asymptotic properties. In particular, the ASCI assumption allows us to Ch. 51: Structural Estimation of Markou Decision Processes 3085 “integrate out” E, from the decision rule 6, yielding a non-degenerate system of conditional choice probabilities P(d,l x,, 0) for estimating 8 by the method of maximum likelihood. Under the further restriction that {E,} is an IID extreme value process we obtain a dynamic generalization of the well-known multinomial logit model, (1.1) As far as estimation is concerned, the main difference between the static and dynamic logit models is the interpretation of the uOfunction: in the static logit model it is a one period utility function that is typically specified as a linear-in-parameters function of 13,whereas in the dynamic logit model it is the sum of a one period utility function plus the expected discounted utility in all future periods. Since the functional form of ugin DDP is generally not known a priori, its values must be computed numerically for any particular value of 8. As a result, maximum likelihood estimation of DDP models requires a “nested numerical solution algorithm” consisting of an “outer” optimization algorithm that searches over the parameter space 0 to maximize the likelihood function and an “inner” dynamic programming algorithm that solves (or approximately solves) the stochastic control problem and computes the choice probabilities P(dl x, 6) and derivatives aP(dI x, @/a0 for each trial value of 8. There are a number of fast algorithms for solving finite- and infinite-horizon stochastic control problems, but space constraints prevent more than a cursory discussion of the main methods in this chapter. Section 3.3 presents other econometric specifications for the error term that allow E,to enter u in a nonlinear, non-additive fashion, and, also, specifications with more complicated patterns of serial dependence in {E,}than is allowed by the CI assumption. Section 3.4 discusses the simulation estimator proposed by Hotz et al. (1993) that avoids the computational burden of the nested numerical solution methods, and the associated “curse of dimensionality”, i.e. the exponential rise in the amount of computer time/space required to solve a DDP problem as its “size” (measured in terms of number of possible values the state and control variables can assume) increases. However, the curse of dimensionality also has implications for the “data” and “estimation” complexity of a DDP model: as the size (i.e. the level of realism or detail) of a DDP model increases, the amount of data needed to estimate the model with an acceptable degree of precision increases more than proportionately. The problems are most severe for estimating beliefs, p. Subjective beliefs can be very slippery, high-dimensional objects to estimate. Since the optimal decision rule 6 is generally quite sensitive to the specification of p, an innaccurate or inconsistent estimate of p will contaminate the estimates of u and /I. Even under the assumption of rational expectations (which allows us to estimate p non-parametrically), the number of observations required to calculate estimates of p of specified accuracy increases exponentially with the number of state and control variables included in the model. The simulation estimator is particularly data-dependent in that it requires 3086 John Rust accurate non-parametric estimates of agents’ conditional choice probabilities P as well as their beliefs p. Given all the difficulties involved in structural estimation, the reader might wonder why not simply estimate agents’ conditional choice probabilities P using simpler flexible parametric and non-parametric estimation methods. Of course, reduced-form methods can be used, and are quite useful for initial exploratory data analysis and judging whether more tightly parameterized structural models are misspecified. Nevertheless there is considerable interest in structural estimation methods for both intellectual and practical reasons. The intellectual reason is that structural estimation is the most direct way to assess the empirical validity of a specific MDP model: in the process of solving, estimating, and testing a particular MDP model we learn not only about the data, but the detailed implications of the theory. The practical motivation is that structural models can generate more accurate predictions of the impacts of policy changes than reduced-form models. As Lucas (1976) noted, reduced-form econometric techniques can be thought of as uncovering the form of an agent’s historical decision rule. The resulting estimate 8 can then be used to predict the agent’s behavior in the future, provided that the environment is stationary. Lucas showed that reduced-form estimates can produce very misleading forecasts of the effects of policy changes that alter the stochastic environment that agents will face in the future. 4 The reason is that a policy CI(such as government rules for payment of Social Security or welfare benefits) can affect an agent’s preferences, beliefs and discount factor. If we denote the dependence of primitives on policy as (u,,~~,fiJ, then under a new decision rule ~1’the agent’s behavior will be given by a new decision rule 6(u,., pa,, /?,,) rather than the historical decision rule 6(u,,pa, fi,). Unless there has been a lot of historical variation in policies a, reduced-form models won’t be able to estimate the independent effect of CYon 6, and, therefore, we won’t be able to predict how agents will react to a hypothetical policy Co. However if we are able to parameterize the way in which policy affects the primitives, (ub, pb, fi,), then it is a typically straightforward exercise to compute the new decision rule a(~,,, p,,, b,,) for a hypothetical policy a’. One can push this line of argument only so far, since its validity depends on. the assumption that agents really are rational expected-utility maximizers and the structural model is correctly specified. If we admit that a tightly parameterized structural model is at best an abstract and approximate representation of reality, there is no reason why a structural model necessarily yields more accurate forecasts than reduced-form models. Furthermore, because of the identification problem it is possible that we could have a situation where two distinct sets of primitives fit an historical data set equally well, but yield very different predictions about the impact of a hypothetical policy. Under such circumstances there is no objective basis for choosing one prediction over another, and we may have to go to the expense of “The limitations of reduced-form models have also been pointed out in an earlier paper by Marschak (1953), although his exposition pertained more to the static econometric model> of that period. These general ideas can be traced back even further to the work of Haavelmli (1944) and others at the Cowles Commission. Ch. 51: Structural Estimation of Markov Decision Processes 3087 conducting a controlled experiment to help identify the primitives and predict the impact of a new policy u’.~ In spite of these problems, the final section of this chapter provides some empirical applications that demonstrate the ability of simple structural models to make much more accurate predictions of the effects of various policy changes than reduced-form models. Readers who are familiar with the theory of stochastic control are free to skip the brief review of theory and solution methods in Section 2 and move directly to the econometric implementation of the theory in Section 3. A general observation about the current state of the art in this literature is that, while it is easy to formulate very general and detailed MDP’s, Bellman’s “curse of dimensionality” implies that our ability to actually solve and estimate these problems is much more limited.6 However, recent research [Rust (1995b)] shows that use of random Monte Carlo integration methods does succeed in breaking the curse of dimensionality for the subclass of DDP’s. This result offers the promise that fairly realistic and detailed DDP models will be estimable in the near future. The approach of this chapter is to start with a presentation of the general theory of MDP’s and then show how various restrictions on the general theory lead to subclasses of econometric models that are feasible to estimate. The first general restriction is to exclude MDP’s formulated in continuous time. Although many of the results described in Section 3 can be generalized to continuoustime semi-Markov processes [Ahn (1993b)], there has been little progress on extending the theory to cover other types of continuous-time objects such as controlled diffusion processes. The rationale for using discrete-time models is that solutions to continuous-time problems can be arbitrarily closely approximated by solutions to corresponding discrete-time versions of the problem [cf. Gihman and Skorohod (1979, Chapter 2.3) van Dijk (1984)]. Indeed the standard approach to solving continuous-time stochastic control problems involves solving an approximate version of the problem in discrete time [Kushner (1990)]. The second restriction is implicit in the theory of stochastic control, namely the assumption that agents conform to the von NeumannMorgenstern axioms for choice under uncertainty so that their preferences can be represented by the expected value of a cardinal utility function. A number of experiments have indicated that human decision-making under uncertainty may not always be consistent with the von Neumann-Morgenstern axioms. ’ In addition, expected-utility models imply that agents are indifferent about the timing of the resolution of uncertain events, whereas human decision-makers seem to have definite preferences over the time at which uncertainty is resolved [Kreps and Porteus (1978), Chew and Epstein (1989)]. The justification for focusing on expected utility is that it remains the most tractable 5Experimental data are subject to their own problems, and it would be a mistake to think of controlled experiments as the only reliable way to predict the response to a new policy. See Heckmail (1991, 1994) for an enlightening discussion of some of these limitations. 6See Rust (1994, Section 2) for a more detailed discussion of some of the problems faced in estimating MDP’s. ‘Machina (1982) identifies the “independence axiom” as the source of many of the discrepancies. John Rust 3088 framework for modelling choice under uncertainty.8 Furthermore, Section 3.5 shows that, from an econometric standpoint, the expected-utility framework is sufficiently rich to model virtually any type of observed behavior. Our ability to discriminate between expected utility and the more subtle non-expected-utility theories of choice under uncertainty may require quasi-econometric methods such as controlled experiments.’ 2. Solving MDP’s via dynamic programming: A brief review This section reviews the main results on dynamic programming in finite-horizon problems, and the functional equations that must be solved in infinite-horizon problems. Due to space constraints I only give a cursory outline of the main numerical methods for solving these functional equations, referring the reader to Puterman (1990) or Rust (1995a, 1996) for more in-depth surveys. Dejnition 2.1 A (discrete-time) l l l l l l Markovian decision process consists of the following objects: A time index te{O, 1,2 ,..., T}, T < 00; A state space S; A decision space D; A family of constraint sets (Dt(st) G D}; A family of transition probabilities {pt+ l(.Is,, ~&):54?‘(S)=-[0, 11);” A family of discount functions {bt(s,, d,) > 0} and single period utility functions {u,(s,, d,)} such that the utility functional U has the additively separable decomposition’ ’ (2.1) ‘Recent work by Epstein and Zin (1989) and Hansen and Sargent (1992) on models with non-separable, non-expected-utility functions shows that certain specifications are computationally and analytically tractable. Epstein and Zin have already used their specification of preferences in an empirical investigation of asset pricing. Despite these promising beginnings, the theory and computational methods for these more general problems are in their infancy, and due to space constraints, we are unable to cover these methods in this survey. 9An example of the ability of laboratory experiments to uncover discrepancies between human behavior and the predictions of expected-utility theory is the “Allias paradox” described in Machina (1982, 1987). ‘“3’(S) is the Bore1 a-algebra of measurable subsets of s. For simplicity, the rest of this chapter avoids measure-theoretic details since they are superfluous in the most commonly encountered case where both the state and control variables are discrete. See Rust (1996) for a statement of the required regularity conditions for problems with continuous state and control variables. t ’ The boldface notation denotes sequences: s = (sc,. , sT).Also, define fI,~~lO~j(sj, dj) = 1 in formula (2.1). Ch. 51: Structural Estimation ofMar-kmDecision 3089 Processes The agent’s optimization problem is to choose (6,, ,6,) to solve the following problem: an optimal max Ed{ U(s, d)}. d=(du,....SZ-) 2.1. decision rule 6* = (2.2) Finite-horizon dynamic programming Markovian decision rules and the optimality of In finite-horizon problems (T < co), the optimal decision rule S* = (S,‘, . . , iif) can be computed by backward induction starting at the terminal period, T. In principle, the optimal decision at each time t can depend not only on the current state s,, but on the entire previous history of the process, d, = sT(.s,, H,_ 1) where H, = (so, d,, . . . , s,_ 1,A,_ 1). However in carrying out the process of backward induction it is easy to see that the Markovian structure of p and the additive separability of U imply that it is unnecessary to keep track of the entire previous history: the optimal decision rule depends only on the current time t and the current state s,: d, = ST(s,). For example, starting in period T we have S,(H,- 1, 4 = argmaxWf,- where U can be rewritten 1, Q-, 4-h (2.3) as From (2.4) it is clear that previous history H,_ 1 does not affect the optimal decision of d, in (2.3) since d, appears only in the final term ur(sr, dT) on the right hand side of (2.4). Since the final term is affected by H,_ 1 only by the multiplicative discount factor nT:,i Bj(sj, dj), it’s clear that 6, depends only on sr. Working backwards recursively, it is straightforward to verify that at each time t the optimal decision rule 6, depends only on s,. A decision rule that depends on the past history of the process only via the current state s, is called Markovian. Notice also that the optimal decision rule will generally be a deterministic function of s, because randomization can only reduce expected utility if the optimal value of d, in (2.3) is unique. This is a generic property, since if there are two distinct values of dED,(S,) that attain the maximum in (2.3) by a slight perturbation of U, we obtain a similar model where the maximizing value is unique. The valuefunction is the expected discounted value of utility over the remaining John Rust 3090 horizon assuming an optimal policy is followed in the future. The method of dynamic programming calculates the value function and the optimal policy recursively as follows. In the terminal period VF and SF are defined by 6$) (2.5) = axmax u&-, 44, dTdk(sT) VF(s,) = In periods (2.6) max t+(s,, dT). dTEDT(sT) t = 0,. . . , T - 1, VT and ST are recursively defined by It’s straightforward to verify that at time t = 0 the value function V:(Q) represents the conditional expectation of utility over all future periods. Since dynamic programming has recursively generated the optimal decision rule 6* = (S,‘, . . . , SF), it follows that Vi(s) = maxE,{ s U(&J)/s, These results can be formalized = s}. (2.9) as follows. Theorem 2.1 Given an MDP that satisfies certain weak regularity conditions [see Gihman and Skorohod (1979)], 1. An optimal, non-randomized decision rule 6* exists, 2. An optimal decision rule can be found within the subclass of non-randomized Markovian strategies, 3. In the finite-horizon case (T < co) an optimal decision rule 6* can be computed by backward induction according to the recursions (2.5), . . . , (2.8), 4. In the infinite-horizon case (T = co) an optimal decision rule 6* can be approximated arbitrarily closely by the optimal decision rule Sg to an N-period problem in the sense that lim Ed, { U,(S, a)} = 1’im sup Ed{ U,(S, a)} = sup Ed{ U(S, a)}. N-m N N+m 6 6 (2.10)
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.