Then I'd Have to Kill You
This is probably old hat for a bunch of the engineering types here, but I got asked to sign a non-disclosure agreement for the first time today. We had a meeting between some faculty and Deans from the college, and some people from a local company to discuss the possibility of forming some sort of cooperative partnership to weasel money out of the state of New York-- I mean, create new technologies, and good jobs for workers in the Schenectady area.
As I commented to one of the other academics, one of the nice things about being in an impractical field like physics is that nobody ever really expects to make money off anything I'm working on. Which means they don't much care who I talk to about it, which means that I don't generally need to be worried about what I say to people.
Of course, the papers we actually signed were sort of the baby non-disclosure agreement that they give to all visitors. The real, serious, swear-on-your-immortal-soul agreement is still being drafted. As a result, the whole thing was kind of a waste of time-- I can't say whether I could be any help to them without knowing some technical information about what they're doing, and without the real-and-for-true NDA, they wouldn't even tell us what's in their ten-year strategic "road map."
I can say that the word "synergy" was uttered far more frequently than it ever should be, but I doubt that violates any--
I sat down last night to watch the PBS television adaptation of Brian Greene's The Elegant Universe, with the vague plan that I would try to use it as the basis to say something about string theory here today.
Alas, I dozed off in the middle of the second hour. The production was slick, and there was some good stuff, but they devoted about one minute in every five to recapping what they just said. I realize that, when discussing complicated material, it's often useful to stop and remind the audience what has gone before, but eight times an hour seems a little excessive.
I did wake up before it ended, so I know that they never actually did talk about string theory in any detail ("Tune in next week..."), but I didn't see enough of what they did say to make any useful comments (other than noting that I liked the fact that they had a couple of people flatly state that without testable predictions, it's not science...).
If you'd like to read more about the show, though, there's always the Washington Post's chat with Sylvester James Gates, Jr., of my grad-school alma mater. He doesn't go into all that much detail, either, but he says a few interesting things, and deserves a medal for smoothly and politely deflecting a large number of kooks.
Fun With Data (Re-)Analysis
David Appell posts a link to a new study claiming to find errors aplenty in a major global warming study. David is fairly skeptical of the new report, mostly due to its source, while Kevin Drum calls it "a weird sort of echo of the Bellesiles/Lott gun disputes." I'm sort of torn, really-- on the one hand, the obvious biases of the journal where their work was published are a little suspicious, but on the other, they have posted their data, and an exhaustive list of the errors they're citing. The "nyah, nyah" tone of some of their comments probably shouldn't really count against them, but it is annoying.
Anyway, I'm not entirely sure what to make of it, but I thought it might be interesting to post a few comments about the presentation of the whole thing, with my earlier comments about data massage in mind. Here's a non-exhaustive list of things that jump out at me from the graphs on their page, and a quick look at the data.
1) First off, it's worth noting that all the "corrected" figures and comparisons between data sets have been shorn of their error bars (the yellow area in the first graph on their page). That's partly understandable, as error bars would likely produce enough visual clutter to make the graphs hard to read. It's a little deceptive, though, as a rough visual estimate of the errors suggests that the corrected data would all fall within the error bars of the original paper. The lack of a plot showing error estimates for the new work is even more deceptive, particularly since the corrections to the very old data (circa 1400) mostly seem to involve removing data points, which would increase the uncertainty.
(I'm also puzzled by the lack of error bars on post-1900 data in the original study. I would guess that this is due to a change in the source of the figures (from proxy estimates to actual data, perhaps), but I haven't found an explanation.)
2) In their second graph, it's somewhat interesting to note that the corrected data set is wildly variable-- much more so than the original. I'm not sure what this indicates-- you would expect data on something like average global temperature to be fairly erratic, but the difference is striking. This probably has to do with the fact that the corrections generally remove data from the samples (they cite a number of "fills" where data points seem to have been interpolated), which again, would increase the variability of the data (and also the uncertainty estimates, which aren't shown...).
3) Something really strange is going on with the first plot. The first graph (from an IPCC report) ends with a huge upward spike (at or above +0.5 in the "anomaly" that they're plotting) which is absent from the raw data presented in the second figure (both sets). What's up with that?
4) In their third graph, comparing the two studies directly, they draw attention to the huge difference between original and corrected figures for temperatures in the fifteenth century. The reason for this is obvious, as the huge swing they found makes the 1400's warmer than the 1900's.
Of course, I find this a little dubious, as the data from those years is far and away the least reliable in the whole set. They're dealing with an extremely limited number of data sources down there (if you look at the raw data files, you see lots of "NA" points for years before 1500), so the error bars (which, again, are never shown) should be gigantic.
Far more interesting, to my mind, is the discrepancy in the 1800's. The corrected data has the nineteenth century being something like a tenth of a degree warmer than claimed in the original study. It's still within the error bars on the first graph, but this difference is pretty striking, and that's an area where you might think the data would be fairly reliable. Then again...
5) The differences between data sets are a whole lot less striking in the raw data than in the final figure. Why? Because the final figure shows a 20-year running average, which smoothes over a lot of the variability in the data. Now, both sets are smoothed in the same way, but this is a technique that I absolutely hate, because it gives the false impression that you've got a lot of data points showing a nice, clean trend from one year to the next, when that's not the case at all.
Twenty years is a pretty long interval to be smoothing over, especially given that that's about the time scale of the big increase in the early 20th century (an area, by the way, in which both studies show essentially the same thing). Fiddling with their data, and reducing the smoothing interval to ten years shows why they did this pretty clearly-- a ten-year interval makes the variability of the data more apparent, particularly after 1950. The nice, smooth plateaus seen in the 20-year smoothed data are replaced by a couple of oscillations between 0 and 0.2 degrees. (Of course, the first graph presented is even worse, using a smoothing interval of 40 (40!) years...)
What would I conclude from all this? It's a tough call, and given #5 (which, by the way, applies equally to both studies), I'm tempted to conclude that both versions are all a bunch of crap.
Looking at their "audit trail," though, it seems clear that there are some seriously dodgy bits in the original data set. The "fills" in particular are disturbing-- the one-year shifts caused by cut-and-paste errors shouldn't skew things all that much, but interpolating data over five or ten-year stretches is pretty dubious. I doubt it was done maliciously (which would keep this out of the Lott/ Bellesiles category)-- it looks like the sort of thing one might do to simplify the data analysis (getting rid of lots of N/A points). It's stupid, but comprehensibly so.
In the end, I'd say that the authors of the news study probably have a point, and it'd be worth taking another serious look at the data. They're not exactly faultless, though, particularly in the lack of error bars.
Looking at the way the data are presented, I'd say that both sets of authors are guilty of manipulating the presentation of the data to push a particular point of view. It's encouraging that the authors of the new paper have actually put their data out there for people to play with, though, and at least some of their complaints do appear to be on the level (based on a quick and casual survey of their results).
Is that enough equivocation for you?
Try, Try Again
A month or so back, I was scheduled to attend a physics conference in DC, which got rescheduled due to a scary apocalyptic hurricaine. The new date is this weekend.
I realize that by posting this here, I've guaranteed that some new weather catastrophe will befall the DC area, to shut things down (a quarter-inch of snow should do it), for which I offer my sincere apologies to any and all DC blog types...
This Isn't SportsCenter
It's only a matter of time before the game really catches on here, especially with crystal-clear commentary like this:
[US Team Captain Dave] Hodges added that at times his squad tried to get a little fancy, which got them into trouble and left attack ball on the ground and not capitalized upon. Adding to those miscues, the kick defense of the United States was a high wire act that kept the Japanese with territory and often possession. And the Japanese took advantage with sparkling runs from their wings for two 5-point touchdowns.
After several weeks of zero-sum football (when my Giants played well, Kate's Patriots lost, and vice versa), we finally had a non-zero-sum weekend. Both the Pats and the Giants won on Sunday. Of course, this came at the cost of a Yankees loss in the World Series, but at least the Red Sox didn't win it...
I'm not really bothered by the Yankees losing for two reasons. One reason is that I grew up during the 80's, when the Yankees were a laughingstock, so their great run over the last several years has been a pleasant departure from what has been locked in my mind as the norm. Yeah, I know, they have the best history of any team in the majors, but that's cold comfort when you're getting mocked in middle school for daring to like an uncool team.
Primarily, though, I'm not bothered by it because I just don't like baseball all that much. For one thing, I'm terrible at the game, which means I start from a baseline of very little interest in playing it, and that's a key element of sports fandom (at least for me). I'm a basketball junkie in large part because I enjoy playing the game, and I didn't start really getting into football in a big way until college, when we used to play pick-up games fairly regularly. Baseball (and, really, any game involving hitting things with sticks) just doesn't do it for me, so I'm not naturally inclined to watch.
The main reason I dislike the game, though, can be summed up in four syllables: Tim McCarver. OK, not just him, but he's sort of the ideal demonstration of the problem, which is stat-wanking. I find McCarver nearly unwatchable, and baseball in general extremely difficult to watch, because it's a game that's obsessed with stupid statistics. Every new batter comes to the plate with another batch-- batting average versus lefthanders, righthanders, guys with middle names beginning with "Q." And every year, there's a new set of statistics-- "on-base percentage," "slugging percentage," average with runners in scoring position, ERA when the moon is in the seventh house.
It's an neverending cavalcade of numbers, most of them drawing on sample sets so small (or stupid) that social scientists would think twice about citing them. For many baseball fans, and too many baseball broadcasters, the game is less about the people on the field playing it than the dance of numbers in spreadsheet programs.
Stat-wanking is one of the greatest evils facing sporting society today, and it's spreading beyond baseball. You now hear the mysterious "quarterback rating" cited time and time again in football, even though nobody can really explain what it means. And college football has taken the bold step of determining its "championship" contenders with a stats-based computer system that nobody really understands.
This creeping cancer on the sporting world is fueled by one thing above all else, which is why I heartily endorse Norman Chad's denunciation of fantasy football:
I couldn't take it anymore, watching them pile into my sports bar, clipboards and legal pads in hand, feverishly working their cell phones up until opening kickoff.
It was as if I were surrounded by the cult of Mel Kiper.
They scream at the TV screens, not caring which team wins or loses. They care only about touchdowns and interceptions and various statistical debris.
It pretty much perverts the whole sports-viewing experience more than if you were sitting next to Bill Walton at a bullfight.
Preach it, Brother Norman.
Down with "rotisserie" leagues, and "fantasy" sports. We need a major-party Presidential candidate with the moral clarity to see this threat for what it is, and put an end to it before the fantasy sport dorks infect games that really matter, like basketball. Eliminating stat-wanking is critical to national security, economic prosperity, and making the world safe for democracy.