Handbook of Econometrics Vols1-5 _ Chapter 39

pdf
Số trang Handbook of Econometrics Vols1-5 _ Chapter 39 41 Cỡ tệp Handbook of Econometrics Vols1-5 _ Chapter 39 2 MB Lượt tải Handbook of Econometrics Vols1-5 _ Chapter 39 0 Lượt đọc Handbook of Econometrics Vols1-5 _ Chapter 39 1
Đánh giá Handbook of Econometrics Vols1-5 _ Chapter 39
4.9 ( 11 lượt)
Nhấn vào bên dưới để tải tài liệu
Đang xem trước 10 trên tổng 41 trang, để tải xuống xem đầy đủ hãy nhấn vào bên trên
Chủ đề liên quan

Nội dung

P. 2342 Hall Abstract A brief account is given of the methodology and theory for the bootstrap. Methodology is developed in the context of the “equation” approach, which allows attention to be focussed on specific criteria for excellence, such as coverage error of a confidence interval or expected value of a bias-corrected estimator. This approach utilizes a definition of the bootstrap in which the key component is replacing a true distribution function by its empirical estimator. Our theory is Edgeworth expansion based, and is aimed specifically at elucidating properties of different methods for constructing bootstrap confidence intervals in a variety of settings. The reader interested in more detail than can be provided here is referred to the recent monograph of Hall (1992). 1. Introduction A broad interpretation of bootstrap methods argues that they are defined by replacing an unknown distribution function, F, by its empirical estimator, p, in a functional form for an unknown quantity of interest. From this standpoint, the individual who first suggested that a population mean, p = xdF(x), s could be estimated x= by the sample mean, xdF(x), s was using the bootstrap. We tend to favour this definition, although we appreciate that there are alternative views. Perhaps the most common alternative is to confer the name “bootstrap” on procedures that use Monte Carlo methods to effect a numerical approximation. While we see that this does have its merits, we would argue against it on two grounds. First, it is sometimes convenient to draw a distinction between the essentially statistical argument that leads to the “substitution” or “plug-in” method described in the previous paragraph, and the essentially numerical argument that employs a Monte Carlo approximation to calculate a functional of F^. There do exist statistical procedures which marry the numerical simulation and statistical estimation into one operation, where the simulation is regarded as primarily a statistical feature. Monte Carlo testing is one such procedure; see for example Ch. 39: Methodoioyy and Theoryfor the Bootstrap 2343 Barnard (1963), Hope (I 968) and Marriott (1979). Our definition of the bootstrap would not regard Monte Carlo testing as a bootstrap procedure. That may be seen as either an advantage or a disadvantage, depending on one’s view. A second objection that one may have to defining the “bootstrap” strictly in terms of whether or not Monte Carlo methods are employed, is that the method of numerical computation becomes intrinsic to the definition. TO cite an extreme case, one would not usually think of using Monte Carlo methods to compute a sample mean or variance, but nevertheless those quantities might reasonably be regarded as bootstrap estimators of the population mean and variance, respectively. In a less obvious instance, estimators of bootstrap distribution functions, which would usually be candidates for approximation by Monte Carlo methods, may sometimes be computed most effectively by exact, non-Monte Carlo methods. See for example Fisher and Hall (1991). In other settings, saddlepoint methods provide excellent alternatives to simulation; see Davison and Hinkley (1988) and Reid (1988). Does a technique stop being a bootstrap method as soon as non-Monte Carlo methods are employed? To argue that it does seems unnecessarily pedantic, but to deny that it does would cause some problems for a bootstrap definition based on the notion of simulation. The name “bootstrap” was introduced by Efron (1979), and it is appropriate here to emphasize the fundamental contributions that he made. As Efron was careful to point out, bootstrap methods (in the sense of replacing F by F) had been around for many years before his seminal paper. But he was perhaps the first to perceive the enormous breadth of this class of methods. He saw too that the power of modern computing machinery could be harnessed to allow functionals of F^to be computed in very diverse circumstances. The combination of these two observations is extremely powerful, and its ultimate effect on Statistics will be revolutionary. Necessarily, these two observations go together; the vast range of applications of bootstrap methods would not be possible without a facility for extremely rapid simulation. However, that fact does not imply that bootstrap methods are restricted to situations where simulation is employed for calculation. Statistical scientists who thought along lines similar to Efron include Hartigan (1969, 1971), who used resampled sub-samples to construct point and interval estimators, and who stressed connections with Mahalanobis’ “interpenetrating samples” and the jackknife of Quenouille (1949, 1956) and Tukey (1958); and Simon (1969, Chapters 23-25), who described a variety of Monte Carlo methods. Let us accept, for the sake of argument, that bootstrap methods are defined by the “replace F by P’ rule, described above. Two challenges immediately emerge in response to this definition. First, we must determine how to “focus” this concept, SO as to make the bootstrap responsive to statistical demands. That is, how do we decide which functionals of F should be estimated? This requires a “principle” that enables US to implement bootstrap methods in a range of circumstances. The second challenge is that of calculating the values of those functionals in a practical setting. The latter problem may be solved partly by providing simulation methods or related 2344 P.Hall devices, such as saddlepoint arguments, for numerical approximation. Space limitations mean that a thorough account of these techniques is beyond the scope of this chapter. However, a detailed account of efficient methods of bootstrap simulation may be found in Appendix II of Hall (1992). A key part of the answer to the first question is the development of theory describing the relative performance of different forms of the bootstrap, and that issue will be addressed at some length here. Our answer to the first question is provided in Section 2, where we describe an “equation approach” to focussing attention on specific statistical questions. This technique was discussed in more detail by Hall and Martin (1988), Martin (1989) and Hall (1992, Chapter 1). It leads naturally to bootstrap iteration, which is discussed in Section 3. Section 4 presents theory that enables comparisons to be made of different bootstrap approaches to inference about distributions. The reader is referred to Hinkley (1988) and DiCiccio and Roman0 (1988) for excellent reviews of bootstrap methods. Our discussion is necessarily kept brief and is essentially an abbreviated form of an account that may be found in Hall (1992). In undertaking that abbreviation we have omitted discussion of a variety of different approaches to the bootstrap. In particular, we do not discuss various forms of bias correction, not because we do not recommend it but because space does not permit an adequate survey. We readily concede that the restricted account of bootstrap methods and theory presented here is in need of a degree of bias correction itself! We do not address in any detail the bootstrap for dependent data, but pause here to outline the main issues. There are two main approaches to implementing the bootstrap in dependent settings. The first is to model the dependent process as one that is driven by independent and identically distributed disturbances - examples include autoregressions and moving averages. We describe briefly here a technique which may be used when no parametric assumptions are made about the distribution of the disturbances. First estimate the parameters of the model, and calculate the residuals (i.e. the estimated values of the independent disturbances). Then run the process over and over again, by Monte Carlo simulation, with parameter values set equal to their estimated values and with the bootstrapped independent disturbances obtained by resampling randomly, with replacement, from the set of residuals. Each resampled process should be of the same length as the original one, and bootstrap inference may be conducted by averaging over the independent Monte Carlo replications. Bose (1988) addresses the efficacy of this procedure in the context of autoregressive models, and derives results that may be viewed as analogues (in the case of autoregressive processes) of some of those discussed later in this chapter for independent data. If the distribution of disturbances is assumed known then, rather than estimate residuals and resample with replacement from those, the parameters of the assumed distribution may be estimated. The bootstrap disturbances may now be derived by resampling from the hypothesized distribution, with parameters estimated. Ch. 39: Methodology and Throryfir 2345 the Bootstrup The major other way of bootstrapping dependent processes is to divide the data sequence into blocks, and resample the blocks rather than individual data values. This approach has application in spatial as well as “linear” or time series contexts, and indeed was apparently first suggested for spatial data; see Hall (1985). Blocking methods may involve either non-overlapping blocks, as in the technique treated by Carlstein (1986), or overlapping blocks, as proposed by Kiinsch (1989). (Both methods were considered for spatial data by Hall (1985)) In sheer asymptotic terms Kiinsch’s method has advantages over Carlstein’s, but those advantages are not always apparent in practice. This matter has been addressed by Hall and Horowitz (1993) in the context of estimating bias or variance, and there the matter of optimal block width has been treated. The issue of distribution estimation using blocking methods has been discussed by Gotze and Kiinsch (1990), Lahiri (1991, 1992) and Davison and Hall (1993). 2. A formal definition of the bootstrap principle Much of statistical inference involves describing the relationship between a sample and the population from which the sample was drawn. Formally, given a functional f, from a class (f,:t~Y->, we wish to determine that value t, of r that solves an equation such as W(Fcl?FJlFo) = 0, (2.1) where F = F, denotes the population distribution function and F = F, is the distribution function “of the sample”. An explicit definition of F, will be given shortly. Conditioning on F, in (2.1) serves to stress that the expectation is taken with respect to the distribution F,. We call (2.1) the population equation because we need properties of the population if we are to solve this equation exactly. For example, let 8, = d(F,) denote a true parameter value, such as the rth power of a mean, Let e= B(F,) be our bootstrap mean, estimator of 8,, such as the rth power of a sample where 3 = F, is the empirical distribution function of the sample from which _? is computed. Correcting gadditively for bias is equivalent to finding that value 1, that P.Hall 2346 solves (2.1) when fr(F,, Fl) = v-1) - W,) + t. (2.2) Our bias-corrected estimator would be 8+ t,. On the other hand, to construct symmetric, 95% confidence interval for 8, we would solve (2.1) when jl(F,, F,) = Z{B(F,) - t d B(F,) < B(F,) + t} - 0.95, a (2.3) where the indicator function Z(&) is defined to equal 1 if event 6 holds and 0 otherwise. The confidence interval is (6 - to, 6 + to), where 8 = B(F,). To obtain an approximate solution of the population equation (2.1) we argue as follows. Let F, denote the distribution function of a sample drawn from F, (conditional on F,). Replace the pair (F,, F,) in (1.1) by (F,, F,), thereby transforming (2.1) to (2.4) We call this the sample equation because we know (or can find out) everything about it once we know the sample distribution function F,. In particular, its solution f, is a function of the sample values. We call & and E{f,(F,, FJ F,} “the bootstrap estimators” of t, and E{f,(F,, F,) 1F,}, respectively. They are obtained by replacing F0 by F, in formulae for to and E{f,(F,, F,)I F,}. In the bias correction problem, where f, is given by (2.2), the bootstrap version of our bias-corrected estimator is I!+ &,. In the confidence interval problem where (2.3) describes f,, our bootstrap confidence interval is (e - &,, 8 + f,). The latter is commonly called a (symmetric) percentile-method confidence interval for 6,. The “bootstrap principle” might be described in terms of this approach to estimation of a population equation. It is appropriate now to give detailed definitions of F, and F,. There are two approaches, suitable for nonparametric and parametric problems respectively. In both, inference is based on a sample X of n random (independent and identically distributed) observations of the population. In the nonparametric case, F, is simply the empirical distribution function of X; that is, the distribution function of the distribution that assigns mass n-l to each point in X. The associated empirical probability measure assigns to a region B a value equal to the proportion of the sample that lies within 2. Similarly, F, is the empirical distribution function of a sample drawn at random from the population with distribution function F,; that is, the empiric of a sample !Z* drawn randomly, with replacement, from 3. If we denote the population by X0 then we have a nest of sampling operations: X is drawn at random from X0 and !E* is drawn at random from X. Ch. 39: Mrthodology and Theoryfor 2341 the Bootstrap In the parametric case, F, is assumed completely known up to a finite vector i, of unknown parameters. To indicate this dependence we write F, = F,*(,), an element of a class {F,,,, k.~Aj of possible distributions. Let 1: be an estimator of I, computed from J, often (but not necessarily) the maximum likelihood estimator. It will be a function of sample values, so we may write it as h(X). Then F, = F,Q, the distribution function obtained on replacing “true” parameter values by their sample estimates. Let X* denote the sample drawn at random from the distribution with distribution function F,,, (not simply drawn from 3” with replacement), and let fi* = A(F*) denote the version of I computed for .Y* instead of .Y. Then F, = F,i*,. It is appropriate now to discuss two examples that illustrate the bootstrap principle. Example 2.1. Bias reduction Here the function f, is given by (2.2), and the sample equation (2.4) assumes the form E{W,) - W,) + [IF,) = 0, whose solution t= is to= 8(F,) The bootstrap - E{O(F,)IF,}. bias-reduced estimator is thus 6, = @+ t*,,= 8(F,) + 2, = 28(F,) - E{O(F,)IF,}. (2.5) Note that our basic estimator I!?= B(F,) is also a bootstrap estimator since it is obtained by substituting F, for F, in the functional formula 8, = 8(F,). may always be computed (or approximated) by The expectation E(B(F,)jF,} Monte Carlo simulation, as follows. Conditional on F,, draw B resamples {.Fz, 1 d b d B} independently from the distribution with distribution function F,. In the nonparametric case, where F, is the empirical distribution function of the sample 3, let F,, denote the empirical distribution function of .!!z. In the parametric case, let iz = I’(%;) be that estimator of &, computed from resample Fz, and put F,, = Fci*,. Define 6: = 8(F,,) and o^= H(F,). Then in both parametric and nonparametrPc circumstances, h=l converges to fi = E(O(F,)lF,} as B+ncj. = E(@*(X) (with probability one, conditional on F,) P. Hull 2348 Example 2.2. Confidence interval A symmetric confidence interval for 8, = U(F,) may be constructed by applying the resampling principle using the function f, given by (2.3). The sample equation then assumes the form P{8(F,) - t < 8(F,) < Q(F,) + t(F,} - 0.95 = 0. (2.6) In a nonparametric context Q(F,), conditional on F,, has a discrete distribution and so it would seldom be possible to solve (2.6) exactly. However, any error in the solution of (2.6) will usually be very small, since the size of even the largest atom of the distribution of B(F,) decreases exponentially quickly with increasing II. The largest atom is of size only 3.6 x 1O-4 when IZ= 10. We could remove this minor difficulty by smoothing the distribution function F,. In parametric cases, (2.6) may usually be solved exactly for t. The interval (& f,, 8+ &J is a bootstrap confidence interval for 8, = 8(F,), usually called a (two-sided, symmetric) percentile interval since &, is a percentile of the distribution of le(F,) - Q(F,)I conditional on F,. Other nominal 95% percentile intervals include the two-sided, equal-tailed interval (i?- f,,, 8 + fo2) and the one-sided interval (- co, & + f,,), where f,,, fo2, and f,, solve P{@F,) < B(F,) - tlF,} - 0.025 = 0, P(B(F,) < QF,) + tlF,} - 0.975 = 0, and P{e(F,) 8+ f,,) z 0.025. The “ideal” form of this interval, obtained by solving the population equation rather than the sample equation, does place equal probability in each tail. Still other 95% percentile intervals are I^, = (e- fo2, 8+ f,,) and III = (- co, 8 + fo4), where too4is the solution of P{fI(F,) d Q(F,) - tlF,} - 0.05 = 0. These do not fit naturally into a systematic development of bootstrap methods by frequentist arguments, and we find them a little contrived. They are sometimes Ch. 3Y: Methodology motivated and Theoryfor 2349 the Bootstrap as follows. Define e* = B(F,), I?(x) = P(8* < ~1%) and I?‘(C()=inf{x:t?(x)>a}. Then and r^, = [I?‘(0.025),&‘(0.975)] fI = [- co,I?‘(0.95)]. All these intervals cover 8, with probability approximately 0.95, which might be called the nominal coverage. Coverage error is defined to be true coverage minus nominal coverage; it generally converges to zero as sample size increases. We now treat in more detail the construction of two-sided, symmetric percentile intervals in parametric problems. There, provided the distribution functions Fo, are continuous, equation (2.6) may be solved exactly. We focus attention on the cases where 8, = Q(F,) is a population mean and the population is normal or exponential. Our main aim is to bring out the virtues of pivoting, which usually amounts to resealing so that the distribution of a statistic depends less on unknown parameters. If the population is Normal N@,c?) and we use the maximum likelihood estimator x = (x, S2) to estimate 1, = (CL,a2), then the sample equation (2.6) may be rewritten as P(ln - “%Nj < t 1F,) = 0.95, (2.7) where N is Normal N(0, 1) independent of F,. Therefore * t = t, = xog5n -1’2B, where X, is defined by P(INI
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.