1.
Cartographic Issues of Data Collection The Census Bureau and Sampling 

Associated reading: Montello
& Sutton, Chapter 8 





Terms to know: Census For overview of Census
2000 results, see here 
Census Resources At the Census: Boundary files Elsewhere: 

Why collect a census? The Bible: Numbers
Chapter 1 Attempts to map the US Census: 1870, 1890, 1910 


Sample, sampling frame, population Population: entire set of entities of
interest Sampling frame: the subset of the population from which cases are actually drawn Eg., in a telephone poll you might identify people from a list of registered voters. The "sampling frame" would be: All people on the list of registered voters whose phone number is correct, and were in town during study period Eg., identifiying species of cacti by driving along a desert service road. Sampling frame: All cacti visible from the roads driven by researcher 

Sampling Design Sampling design: how cases are drawn from the sampling frame 1. Nonprobability sampling: the probability of selecting a particular case is unknown Examples include snowball sampling where a researcher uses a case already selected to find other cases, eg., study of spatial pattern of drug users might ask one user about other users; Also Convenience sampling, where take every case convenient, eg., sampling down mining pits 2. Probability sampling: cases have a known probability of being selected Examples include simple random sampling where each case has equal probability of being selected. Selecting a case does not affect probability of selecting other cases; Also systematic random sampling, where probability of selecting a case does affect probability of affecting other cases; Also stratified random sampling, where segment sample frame into subsets and sample from each subset. Eg., segment by race, ethnicity, age, socioeconomic status etc. Also multistage area sampling or sampling at different spatial scales (states/counties/census tracts) 


Implications of sampling 1. Representativeness: Is the sample representative of the sampling frame. And, is the sampling frame representative of the population? 2. Generalizability: what larger set can we draw inferences about based on the sample? Most textbooks (and the Wright article below, p. 34) suggest a scientific sample is a probability sample. Why? Because it makes the link between the sample and the sampling frame more (but not perfectly) certain However (and it's a big however!) nonprobability designs are common, and perfectly worthwhile. They are common because probability sampling only tells you the samplesampling frame link, not the sampling framepopulation representativeness. Eg., a migration study. Population = all people who are moving, who have ever moved, or who will ever move. Obviously not possible. A realistic sampling frame is based on convenience or = eg., last two decades of US Census data 3. Spatial sampling from continuous data. Sampling from truly continuous data (eg temperature) requires breaking the surface into discrete objects (points, lines or areas). this is a spatial sampling frame. How many objects (eg.,
rain gauges) required?



Politics of sampling

Article by Tommy Wright 

Census data comes at various scales: Metropolitan
Statistical Area (MSA); city containing at least 50,000, or urbanized area of
50,000 with total metropolitan population of 100,000 




Supplements to the 2000 Census American Community Survey (ACS) continues to collect data.


Redistricting Redistricting: carried out after every census. The 1999 elections determine who will have a role in redistricting. A recent Supreme Court ruling found that interdecade redistricting was legal. After the 1980 Census,
17




Problems with using census data 1. Key terms may vary from census to census. Example: What is "race" and "ethnicity" in the Census? Anthropologists assert that "race" and "ethnicity" are not biological in origin, but rather cultural. There is more genetic diversity within a race (eg., whites or AfricaAmericans) than between races. Race is therefore a cultural construct which can change over time. The word "race" in the modern sense dates back only to the late 18th century! Eg., the category of "Hispanics" was only invented in the 1950 censusnot possible to compare them before that 2. Gerrymandering/MAUP 3. Nonparticipation and the "undercount" The undercount is the number of people missed by the census. Mostly an undercount, but also includes some overcount (eg., students counted in dorms and at parents) However, this error is not distributed evenly. That is, some people are more likely to be undercounted than others. These include children, renters in rural areas, and racial minorities. For example, the actual undercount in 1990 among whites was 0.7%, but among African Americans 4.4%, and Hispanics 5%. This means that federal allocation of funding is not accurate. There is also a
geographical uneveness to the error. Some places in
the
You might ask how they know how many people they didn't count? The answer is that it is a prediction based on sampling models.



Ensuring privacy 



