Sunday, May 3, 2015

Annual Census Report: Errors & Recommendations

Click here to download a pdf version of this article (tagged for screen readers)

Eric L. Nelson, Ph.D.

The 2013 Annual Census of Employees in State Civil Service (aka Annual Census Report, or ACR) contains a number of mistakes.  The purpose of this blog is to illuminate some of these problems, and to suggest how they could be corrected in future ACR's.

Problem #1: Improper Use of Trend Line Graphs

Figure 1.
Trend line graphs (aka figures) are used to show within group changes over time.  For example, in the last 10 years the state's hiring of people with disabilities has consistently been much lower compared to the four comparison groups, as seen in Figure 1 (left).  Click here to see the data table used to construct this figure.

The essential point is that trend lines should only be used to compare measures taken at different times of the same group.  Trend lines should never be used to join discrete (different) groups, because doing so improperly suggests that a relationship may exist between these groups.

Figure 2.  Reproduction of Figure 24, ACR 2013

An example of improper use of a trend line graph is found in Figure 24, 2013 ACR, reproduced in Figure 2 (click to enlarge).  As can be seen, a trend line joins discrete groups together, i.e., males and females, people with disabilities, and many races.  This linkage could cause readers to mistakenly conclude relationships exist where there may actually be none.

Figure 3.  Reorganization of Order of Presentation, Figure 24, ACR 2013

A similar problem is created when discrete data is sorted into some type of order such as smallest to largest, and then graphed using a trend line, as seen in Figure 3 (click to enlarge).  This figure was created using the same data used to construct Figure 24.  Notice the rising trend line which seems to suggest a relationship between discrete groups where none may actually exist.

Figure 4.  Bar Graph Using Figure 24 Data

Bar graphs are one way discrete data can be presented, as demonstrated in Figure 4 (click to enlarge).  Bar graphs display individual measures using bars that are not connected by a trend line. This figure was created using Figure 24 data.

Recommendation: The state should consider using bar graphs to display discrete data.

Problem #2: Failure To Provide Data Tables Behind Some Figures

It is helpful to provide data used to construct a figure for at least two reasons.  First, the figure may draw the attention of a reader who wants more detail, as provided by actual measures in a table.  Second, a reader may not believe a graph has been created accurately.  By providing tabled data, (s)he can either become convinced of the accuracy of the graph, or may find a mistake and thereby be able to bring that to the attention of the graph's author. 

Figure 24 (ACR, 2013), is an example of a figure for which data is not cited nor provided.  Deduction shows it to have been created using data from Tables B and J of ACR 2013.  Table 1 (above left) is an example of how these data could have been provided.
Recommendation: The state should consider citing data used to create figures in ACR's.  If data from multiple sources are used, the state should consider providing a table which demonstrates how graphed values were calculated.

Problem #3: Reversal of Graphing Coordinates

Figure 5.  Reproduction of Figure 22, ACR 2013

Traditionally categories of data are plotted along the abscissa (x axis) of a graph (aka figure).  Categories can include groups such as disability status, race, and sex.  The abscissa is the horizontal line running along the bottom of a graph.  Outcomes are listed on the ordinate (y axis), which is the vertical line on the left side of the graph. 

An example of the customary way to organize a graph can be seen in  Figure 1 (top of first page).  The data category is years, plotted along the abscissa, and the outcome category is percentages, plotted along the ordinate. 

In some ACR figures the graphing coordinates are reversed.   For example, see Figure 22, reproduced as Figure 5 here (click to enlarge).  Notice the data categories (sex, disability status, veteran status, and race) are placed on the ordinate, and outcomes (amount earned per year) are placed on the abscissa. 

Recommendation: The state should consider plotting data categories on the abscissa and outcomes on the ordinate.

Problem #4: Color Coding & Labeling Deficiencies

Using Figure 22 again, notice 22 data categories are color coded.   Women, Native American, and Hawaiian people are plotted with an almost the identical shade of dark brown.  Similarly, veterans, Guamanian, Japanese, Laotian people and Other Race or Ethnicity are plotted using similar shades of black or dark gray.  Consequently it is difficult to distinguish between many groups.  This can lead to interpretive mistakes due to mis-identification.

Recommendation: The state should consider adding labels to each plot line, and using varied patterns to create trend lines, e.g., small dashes, large dashes, stars, asterisks, and so forth. 

Problem #5: Too Many Data Elements In A Graph

California's rich tapestry of people, by race, is reflected in its workforce.  Prior to 2013 the state reported eight (8) race groups.  Beginning in 2013 the state expanded to 22.  These are compared in Table 2.  Unfortunately, when a disparate number of individual groups are graphed, an almost uninterpretable display results.  Figure 22 is an example of this problem.  To understand trends between the races, groups that can aggregated must be joined together.   Thus, it is necessary to re-aggregate the subcategories of Asian in order to perform meaningful trend analysis of the type demonstrated in Figure 1 (top of first page).

Recommendation: The state should continue to provide expanded race group data in tables; however, when graphing, these data should be aggregated. 

Problem #6: Manner of Data Presentation

Tables in the Annual Census Report traditionally provide counts and percentages in each cell.  An example is seen below in Table 3 (click to enlarge). 
Table 3.  Partial Reproduction of Table C, 2013 ACR

Presentation of two forms of data in a single cell may be a convenience for some readers, because it enables them to be able to evaluate actual counts and their corresponding percentages side by side.  However, this method of data presentation leads to substantial difficulty when one attempts to convert these data tables, which are provided in pdf reports, into spreadsheets using programs capable of doing so such as Adobe Pro 11.  When data is mixed even powerful software is unable to parse it out.  Further adding to the difficulty of checking the state's work for sufficiency and accuracy is the fact that although these data are likely to exist, as evidenced by the style of figures presented, these being consistent with Microsoft Excel spreadsheets, never the less the state does not provide these spreadsheets along with the ACR. 

A further and very serious concern is that these data tables are complicated and may not be easily available to person with low or no vision, who rely upon screen reader software to try and read state documents such as these.

Recommendation: The state should post spreadsheets containing data used to create the ACR. The state should provide one data per cell in independent tables that may be more easily acccessed by persons using screen reader software. The inaccessibility of the Annual Census Report, and 5112 reports, and 5102 reports, etc., is a serious issue.  Friends of mine who have blindness are unable to read them using screen reader software because they are not tagged, provided in an acceptable font at an appropriate size, and most often without alt text.  To their credit, CalHR added alt text to parts of the 2013 Annual Census.  However in total, it is 25 years since the Americans with Disabilities Act was made federal law; yet, the state in general is still failing to make many of its documents accessible to persons with blindness or low vision.  This is duscussed further in another blog (click here TBA).

Problem #7: Unclear & Mistaken Table & Figure Descriptions

Table 4.  Reproduction of Figure 1 (sic) 2013 ACR

Some table and figure titles in the ACR are not clearly stated.  The example in Table 4 (click to enlarge) is taken from the 2013 ACR.  It contains two errors.

First, it is a table, not a figure.  Second, the title is ambiguous.  It fails to describe the contents of the table, or provide needed temporal (length of time) context.  A more appropriate title could be, "Table X: Changes in Workforce Race Representation, 2009 to 2013", or similar.

Recommendation: The state should consider properly distinguishing between tables and figures, and to also use accurate labels which describe tabled data and their relationships.  

Please cite this blog as: Nelson, Eric L. (2015).  Errors in the 2013 Annual Census Report, With Recommendations For Future Reports.  Trends in State Work,