Interpretive Frameworks, Fragile Empiricisms, and Data
[a.k.a. I am the master of sexy titles]
[At the end of this essay, I claimed that the assertion that Finland outperforms the U.S. on international education performance tests, such as PISA, is false. Here’s why…]
We first may dispense the dogma that Steve Sailer is somewhere between being a stone-in-the-shoe and a literal-Hitler of the beacons of cultural decline.[1] With that ablution out of the way, Sailer wrote an interesting piece that makes the world’s most obvious point™ and yet is somehow invisible to so many working in data.[2]
In What’s the Matter With Wisconsin?, Sailer makes the observation that it’s odd that Wisconsin blacks perform below many national averages given that Wisconsin is, in many ways, a sort of bleeding heart white socialist utopia. (Milwaukee has, quite literally, been under socialist rule for many years and mayoral administrations.) A few of Sailer’s observations:
…Wisconsin blacks usually score the lowest in the nation on the federal National Assessment of Educational Progress tests of public school students. And the white-black gap on the NAEP is larger in Wisconsin than anywhere other than Washington, D.C.
…the black-white imprisonment ratio in Wisconsin is an extraordinarily high 11.5 to 1. That’s the second-highest of the 50 states, behind only New Jersey (12.2 to 1) and just ahead of Iowa (11.1), Minnesota (11.0), and Vermont (10.5). In contrast, the most equal black-white incarceration ratio is in Hawaii (2.4) … followed by Southern states such as Mississippi (3.0), Georgia (3.2), Alabama (3.3), and Kentucky (3.3).
And blacks in Wisconsin are 9.0 times more likely than the overall population to use welfare, the worst ratio in the country.
I’d like to think a decent data scientist understands the issue immediately, but I’m afraid that most don’t, particularly in education.
Generally, education data folks (and most egregiously, reformistas), view the education landscape as:
Input(s) >> Process >> Output(s)
The inputs are usually teachers or students or some hardened bureaucracy or maybe curricula. Usually, though, curricula, pedagogy, etc., are part of the process. Outputs include test scores or graduation rates or, really, any other outcomes.
The analysis is, too often, thus: if there is variance in the outputs and the inputs are constant, then the culprit is the something in the process. Hence, all the focus on curricula/content, pedagogy, professional development, and the widest assortment of panpharmacons.
So when international test scores (e.g. PISA) show the U.S. ranking poorly or dropping in the rankings, this surely means that something is wrong with the process (pedagogy, curricula, content, etc.) or, at least, something in the process could be improved.
Or when national test scores show some kind of stasis or decline, surely we must improve something to turn things around. Surely [increased funding / better ways to teach reading and math / more dynamic methods to connect to students / etc.] is the answer. Great migrations of ed reformers flock to output variance.
When a state falls behind other states or leaps ahead of other states, we make some conclusions about their standards, teaching, curricula, funding, etc. Some states do these things better than others, and the variable outputs are evidence of this, right? Right?
Fragile Empiricism
The school board of one of the whitest and wealthiest towns in the United States had all the evidence — test scores and whatnot — to demonstrate that their school district (and, by osmosis, the school board itself) was one of the finest in the nation. Nay, the world. Nay, the galaxy.
And so I told them that such vain empiricism was evidence of no such thing, and, in all likelihood, if you took these same students and parents and teleported them en masse into a notoriously bad school district, you’d probably get roughly the same good results. The confidence that such Übermensch outputs were largely the result of a genius process (and those the output of genius people, obviously) and not the result of certain inputs struck me as the sad desperation of fragile empiricism. Sure it’s empirical in the sense that those outputs were observed, but it’s fragile in its selection of bias-confirming evidence.
I suppose it’s my fault that I said all that out loud, to the board, in English. Frankly, I’d never witnessed a fuller display of simian retardation than the reaction of that board. Theirs was not an endearing or entertaining idiocy. It was just idiocy. Such is the fragility of empiricism birthed in an echo chamber.
One mustn’t necessarily subscribe to a fundamental incomparability of subpopulations, longitudinally or otherwise. Medical data across such populations tends to be comparable whereas education data isn’t (though less comparable than people assume). So when an IBM SVP (such as, perhaps, John Kelly) asks why education data products can’t be scaled globally in the same way that health data products can, the answer lies in the general fungibility of health data.
So, of course, the problem with the assumption of process-variation is that we often reflexively deny input-variation. We must assume that cities, states and nations have similar-enough inputs to then focus on process. That whitest of school boards must believe that their test scores crushed a nearby city that has a statistically significant number of students who just got off the boat from South America because of better teachers. It can’t possibly be because their own students were exposed to years of college-level English before they reached kindergarten. Nope.
But the fact is that we cannot compare cities and states and nations, even with supposedly similar demographics. Why does the ignorance persist that all black males, for example, constitute a constant such that they are comparable across vast expanses of geography and time? Terms such as “black” (and “African”) are largely constructs of mid-18th century slave economies; prior to that, Europeans referred to “blacks” the same way they tended to refer to Europeans — by the nation or ethnic group to which they belonged, be it Italians or Spaniards or Angolans. The idea that somehow any group of similarly-aged “blacks” across a continent constitute a comparable constant is some kind of limp racism.[3]
And thus is Sailer’s observation. The outputs in Wisconsin are below average, yet any reasonable person would conclude that the processes in Wisconsin have been above average for the last forty years, so maybe the inputs are variable. Maybe Wisconsin blacks and Texas blacks aren’t the same blacks. Because, you know, maybe all blacks aren’t the same. And maybe all Asians aren’t the same. And maybe all dogs aren’t the same. It’s even possible that birds are different. It begins to feel like a little kindergarten quiz that an awful lot of big data people failed.
But wait, it gets worse.
If the assumption that (sub)populations across continents are constant is repugnant, then the assumption that populations across continents across time is repuganter. Probably even repuganist. Comparing minority sub-population education trends over the last sixty years (really, as long as we’ve been doing it) is to assume that, in small part, Hispanics in high school in the 1960s are similar to those today, and had similar parents and households and similar experiences and expectations. Is there any education population or subpopulation today in the U.S. that one could reasonably assume has been constant since the first were born in the 1950s up through the most recently tested (that is, born around 2012)? Apart from uncontacted Amazonian tribes, is there any population that one could reasonably assume has held constant over sixty years?
Our assumptions about population groups are transparently inadequate. Time then (as time does) makes a mockery of such assumptions. Take a city, a district, a state or the entire United States, and trace its demographics over the last two generations. So if the outputs have changed, maybe the process has changed, but haven’t the inputs changed as well? And if the inputs have changed, you can’t draw conclusions about the processes. Unless, of course, you think all blacks and Hispanics and Asians and whites are the same.
Of course, none of that can possibly be true or else the reformers would have to stop what they’re doing and think.
[1] I don’t believe it, but disarmament precedes didactics.
[2] https://www.takimag.com/article/whats_the_matter_with_wisconsin_steve_sailer/
[3] Favorite example is when (years ago) the Superintendent of Miami-Dade explained low test scores in reading by noting that many of the district’s students don’t really speak English. Of course, I produced Ft. Lee’s data — also a significant number of students who weren’t raised speaking English and don’t speak English at home, and yet produced reading scores that were about 2 standard deviations higher than Miami-Dade’s. That Superintendent was more correct than I … those two populations aren’t comparable. Of course, another set of populations that isn’t comparable is Miami-Dade to itself longitudinally. Even more interesting is up the road in Orlando, which has experienced intense population changes every few years for decades. Less interesting is my apparent affection for grammatically incorrect sentences.
About Nathan Allen
Founder of Xio Research (A.I.), Applied Magic (A.I.), and Andover (data). A.I. strategy and development leader at IBM. Academic training is in intellectual history; his most recent book, Weapon of Choice, examines the creation of American identity and modern Western power. Don’t get too excited, Weapon of Choice isn’t about wars but rather more about the seeming ex nihilo development of individual agency … which doesn’t really seem sexy until you consider that individual agency covers everything from voting rights to the cash in your wallet to the reason mass communication even makes sense…. Lectures on historical aspects of media, privacy/law, and power structures (mostly). Previous book: Arsonist.