Thomas A. Louis, Ph.D.

Thomas A. Louis, Ph.D.
Senior Statistical Scientist
The RAND Corporation

POSITION STATEMENT

Legislative mandates, programmatic requirements and research studies intending to achieve societal goals require that estimates of population size, income, demographic indicators exposures (to pollutants, stress, social "goods") and outcomes (such as health status and inequity measures) have a fine geographic, temporal and demographic resolution. Therefore, estimates for small areas and for narrow time windows are needed to inform policy. However, direct estimates generally are statistically unstable and must be stabilized while the required spatio-temporal resolution. Furthermore, financial, practical, political and ethical constraints limit the spatio-temporal resolution and alignment of direct estimates. Information on the quantity of interest may be spatially direct, but temporally mis-aligned (e.g., Census information used for a non-census year; historic disease outcomes); temporally aligned, but not direct (e.g., Current Population Survey [CPS] information used for non-sampled regions; aggregated health outcomes) or related to, but not a measurement of the attribute (e.g., administrative records). Such syntheses are necessary to produce valid maps, Geographic Information System (GIS) displays, regulatory or policy assessments. They require careful construction and valid statistical analysis, especially in a world where GIS systems link information and produce high-impact graphical displays.

Therefore, small-area estimates and other inferences require synthesizing evidence from many data bases and "borrowing information'' from indirect geographic, temporal and covariate domains. The Census Bureau�s Small Area Income and Poverty Estimates (SAIPE) program provides a case in point. Title I allocates more than $7 billion per year to school districts based on their estimated of the number of children in poverty and the poverty rate. Census Bureau estimates integrate information from the Census, the CPS and administrative records such as tax returns via a mixed-effects, hierarchical model.

Hierarchical models are absolutely necessary when dealing with such multiple sources of information and complicated spatio-temporal relations and Bayesian approaches have proven very effective. For example, mapping region-specific posterior means (or other feature of the posterior distribution) smoothes a crude map. The beauty of the Bayesian approach is its ability to structure complicated assessments and to guide development of valid statistical models and inferences that properly account for stochastic and modeling uncertainties. Properly developed, the Bayesian approach produces objectively valid designs and analyses, commonly more effective than traditional methods. Computing innovations such as Markov Chain Monte Carlo (MCMC) enable implementation of complex, relevant models and applications, burgeon.

A few examples illustrate the power of a Bayesian approach. Measurement error permeates observed information and can substantially influence conclusions. For example, if a covariate of interest follows a standard measurement error model in which the observed value is a random deviation from the true value, the estimated covariate effect will be attenuated and its impact underestimated. Many measurement error processes are hybrid forms and require sophisticated modeling; all can alter the functional form of relations. A Bayesian model treats the true values as missing data. Observed values come into the model via the posterior distribution of the true values conditional on the observed. This approach maintains focus on the unobserved, structural relation and thereby automatically de-attenuates slopes in an intended functional form. Studies with different measurement error processes can produce heterogeneous results. Appropriate measurement error modeling has the potential to "line up" the studies.

Building models that link misaligned information (health outcomes, exposures and demographics) at a common spatial resolution gives models a stable focus, unlinked from aggregation (e.g., by aggregating a Poisson, log-linear model rather than using a Poisson log-linear model for the aggregated information). The approach ensures that assessments from different studies address the same question, can be "`exported'' and are potentially combinable. Maintaining focus requires a hierarchical model and Bayesian processing sorts out the relational thicket.

Estimates and other evaluations should be face-valid, for example sum of within-county block-group estimated counts should equal county-level estimates. More generally, estimates should be "conformable" both in their value and uncertainty. That is, analyses at a coarser geographic, temporal or demographic resolution should be consistent with a finer-grained analysis and vice-versa. Conformability implies that the coarser-grained estimate be a weighted average of the finer-grained, with ``logical'' weights (e.g., population size).

The growing list of important issues and goals that benefit from a Bayesian approach also includes accommodation of missing data, research synthesis (e.g., meta-analysis), addressing non-standard goals such as estimating histograms and ranks and development of causal models that forge links between science and policy.

Bayesian structuring and analysis produces many benefits, however the approach has been criticized for being too fragile in its dependence on a specific prior distribution, for being too subjective to have a role in formulating policy. But, computing advances have enabled robust, objectively valid procedures. Of course, the Bayesian approach is by no means a panacea, it is by no means "plug and play." Considerable care and sophistication is essential for valid application and a multidisciplinary, team approach is absolutely necessary.

Statistical procedures cannot substitute for scientific and policy insights or for relevant, reliable and sufficient information. Indeed, the Bayesian approach demands that the relevance and characteristics of all inputs be documented and captured by the model, that goals be explicit and that uncertainty be fully taken into account. Though statistical cures are available for some data shortcomings, such cures are far less effective than improving inputs. Space-age techniques cannot rescue stone-age data. Therefore, we need to advocate and create improved information systems, improved understanding of the stochastic basis of observed information and improved alignment of statistical designs and analyses with societal goals.

[TOP]