Measuring Accuracy

A detailed account of our methodology is included in Information and Democracy. (Although we rescale our measures here, a process which we describe below.) Analyses including in Information and Democracy can be replicated using data archived at the Harvard Dataverse. Several of our methods are also outlined in project papers “Tracking the Coverage of Policy in Mass Media” and “Dictionaries, Supervised Learning, and Media Coverage of Public Policy.” Note that we test machine learning algorithms as a means of capturing our “media policy signal,” but we rely here on a hierarchical dictionary approach. For comparisons of the two approaches, see the above references.

Below, we provide some descriptions of our Data and Measures.



Our full-text newspaper corpus is drawn from Lexis-Nexis using the Web Services Kit (WSK), which was available for the several years over which we gathered data for this project. We focused on a set of 17 major daily newspapers selected based on availability and circulation, with some consideration given to regional coverage. The list of papers on which our analyses are based is as follows: Arizona RepublicArkansas Democrat-GazetteAtlanta Journal-ConstitutionBoston GlobeChicago TribuneDenver PostHouston ChronicleLos Angeles TimesMinneapolis Star-TribuneNew York TimesOrange County RegisterPhiladelphia InquirerSeattle TimesSt. Louis Post-DispatchTampa Bay TribuneUSA Today, and Washington Post. These are 17 of the highest-circulation newspapers in the US, three of which claim national audiences, and seven of which cover considerably large regions in the northeastern, southern, midwestern, and western parts of the country. 

We begin our data-gathering in fiscal year (FY) 1980, but that at time only the New York Times and Washington Post are available. 1985 sees the entry of the Los Angeles Times, the Chicago Tribune and the Houston Chronicle and other papers enter the dataset through the later 1980s and 1990s. We have 16 newspapers by 1994, and the full set of 17 by 1999. We then have access to all papers up to the end of 2018. Information and Democracy considers all available data for all sources. Here, in order to make results more directly comparable, we consider all available data from FY1995 to FY 2018. (This means that all newspapers are considered over the same time period, except the Arizona Republic, which is available only from FY1999.)

We do not collect these newspapers in their entirety, but rather focus on content related to each of our five policy domains. We do so using a search in the Lexis-Nexis system that combines assigned subject codes and full-text keywords. (Search details are included in Information and Democracy.) Our searches were intended to be relatively broad – we used expansive searches, capturing some irrelevant content but also the vast majority of relevant content. We did this because, as we shall see, our focus is not on entire articles but rather on relevant sentences that we extract from this downloaded content. 


Our corpus of television news broadcasts also is extracted from Lexis-Nexis, again using the WSK. Television transcripts are stored in Lexis-Nexis in a somewhat different format than newspaper articles. In some cases, content is stored at the story level, like newspapers; in other cases, content is stored at the show level, i.e., there is a single transcript for an entire half-hour program. This makes a subject-focused search across networks rather complex: for the broadcast networks we extract just parts of a show, and for the cable networks we extract the entire show. Because we eventually focus on relevant sentences, however, our approach to television transcripts can be broad, as we can download all transcripts, story- or show-level, and extract relevant sentences afterwards.For the three major broadcasters, ABC, CBS and NBC, we download all available content from 1990 onwards in any morning or evening news broadcast or major “newsmagazine” program. 

The cable news networks, CNN, MSNBC and Fox, do not have feature news programs, so we cannot quite get comparable programming from the cable networks. We download all available content, drop infrequent programs, and keep the major recurring programs.

Government Spending

Government spending is based on appropriations (spending commitments) in each of the five policy domains, as reported in the Historical Tables produced by the Office of Management and Budget (OMB). In some cases our domain-level spending corresponds exactly to major categories in OMB data; in other cases we make small (but straightforward) adjustments, as in Wlezien (2004) and and Soroka and Wlezien (2010).

  • Defense: “National Defense”
  • Welfare: “Income Security” excluding “General Retirement and Disability Insurance,” “Federal Employee Retirement and Disability,” and “Unemployment Compensation”
  • Health: “Health”
  • Education: “Education,” excluding “Training and Employment”
  • Environment: “Environment”

For more information see the Historical Tables, Budget of the United States Government.



Volume is the average number of sentences about spending in each policy domain per year. We rely on a hierarchical dictionary approach to extract sentences about spending in each policy domain. As noted above, this approach is outlined in more detail in Information and Democracy, as well as in several project papers including “Tracking the Coverage of Policy in Mass Media,” “Dictionaries, Supervised Learning, and Media Coverage of Public Policy,” and “Mass Media as a Source of Public Responsiveness.”


Starting with the same sentences extracted for the measure of Volume, we use a dictionary to code each sentence as indicating either upward change (scored as +1), downward change (scored as -1), or no change (scored as 0). We then take the sum of all of these codes, across all sentences, for each fiscal year, to produce a measure of the “media policy signal.” In years in which upward change sentences outnumber downward change sentences, the signal is positive. In years in which downward change sentences outnumber upward change sentences, the signal is negative.

Accuracy is then based on a model regressing this media signal on lagged changes in spending (at t-1), current changes in spending (at t), and future changes in spending (at t+1). This modeling decision is based on analyses in Information and Democracy suggesting that media coverage reflects, to varying degrees, spending in the previous, current, and future fiscal years — keeping in mind that decisions on the future (usually) are made in the current year.

We standardize all measures spending change by domain, and all media signals by domain and outlet; and we sum the coefficients for lagged, current, and future spending to arrive at our measure of accuracy. That measure captures the estimated impact of a standard-deviation change in each spending measure on standardized units of the media signal. This allows us to compare estimates across domains, media, and outlets, where the levels and variation in spending and media coverage differ dramatically. The value 0 indicates no impact of spending on media coverage, and thus inaccurate coverage. Positive values indicate correspondence between spending change and the media signal, and thus accuracy, where the larger the value the greater the accuracy. Negative values indicate that the media signal moves against, not with, spending change. This would of course be highly inaccurate coverage.

Our measure of accuracy has no upper or lower bound, but the observed range in our analyses is from roughly -1 to +4.5. In Information and Democracy, we consider these summed coefficients as they are. For the purposes of this site, we divide these scores by 5 to produce a more easily-understood index for which 0 is entirely inaccurate and +1 is highly accurate.

The value +1 on our Accuracy scale is purposefully a little higher than we observe in our current data (where our highest observed value is roughly 0.86). This allows for the possibility of higher accuracy scores calculated in future estimates.

(Note that in diagnostic analyses we examine these accuracy estimates alongside the R-squared for each model, which indicates the proportion of variance in media coverage that is “explained” by spending coefficients. R-squareds do not distinguish between positive and negative correlations, so they are of limited use as a measure of accuracy.)