Steve Kass

News

Archived Posts from this Category

26 Nov 2009 22:17

Localization (probably) strikes again

Yesterday, the Italian postal service misprocessed a bunch of ATM and credit card transactions. Specifically, the virgola was shifted two places, appending two zeros to the transaction amount. There’s no telling exactly how this happened, but it wouldn’t surprise me if it had something—if not everything—to do with localization in one way or another. In Italy, a comma (virgola), not a period, precedes a number’s decimal part, but software might see things otherwise.

Some software interprets number strings according to the operating system localization (unless overridden). Other software ignores the OS localization. SQL Server’s CAST operator, for example, only accepts a period as the decimal separator, and it disregards commas in strings intended to represent numbers.

At least it does this as of 2005; previous versions followed a complicated set of rules in an attempt to disallow numbers that weren’t valid in the U.S., India, or China. In India (ones, thousands, lakhs, crore, thousand crore, lakhs crore, etc.), digit groups bounce between two and three digits, and 1,234,56,70,000.0 is a valid number. In China (yi1, wan4, yi4, wan4 yi4, etc.), it would be 123,4567,0000.0. Interpreting human-readable representations of numbers is no simple task. Explaining the issue isn’t much easier.

In all versions of SQL Server, this happens regardless of language or culture settings.

select cast('115,00' as money) as TooMuch;

TooMuch
---------------------
11500.00

[From Slashdot, noting ilsole24ore.com]

One Response to “Localization (probably) strikes again”

Steve Says:
November 27th, 2009 at 11:53 am
Here’s a nice article about localization: “Does Your Code Pass the Turkey Test?” http://www.moserware.com/2008/02/does-your-code-pass-turkey-test.html

26 Nov 2009 0:56

9/11 pager intercepts on Wikileaks

Early this morning, Wikileaks began posting alphanumeric pager messages from four carriers (Arch, Metrocall, Skytel, and Weblink_B) that were intercepted during a 24-hour period beginning early on September 11, 2001. Alphanumeric pager messages are unencrypted, and, like communications over a public 802.11 wireless network, they’re skimmable with the right (and not exotic) software and hardware.

“Due to today’s tragic events, it makes sense to cut back wherever feasible on payroll. Expect a very light business day. Please call all stores and review payroll issues”
“RING ALL CHICAGO AIPORTS AND EVERY MAJOR BUILDING DOWNTOWN. BUSH IS DOING A SPEECH. THIS IS SERIOUS POOH..”
“Holy crap, are you watching the news.”
“I hope you have gone home by now. The BoA tower and space needle here are closed. I suspect tall buildings across the country will be closed. Take care my love.-cb”

This might be the most interesting public data mine since the AOL breach. The total volume is far less, but unlike the AOL data, this data hasn’t been anonymized. There are full names, phone numbers, and other identifying information in the mix.

5 Nov 2009 23:43

Totals equal 100% because of lying

Posted by Steve under News , Statistics , Vulpigeration
Comment on this post

If you read my last post, you know I’ve been looking at some very fishy survey data from Strategic Visions, LLC. The data seems to stink no matter how anyone looks at it, and mathematicians, statisticians, and programmers have been looking hard and every which way. Instead of throwing yet another heavy mathematical brick at the poor numbers, Let’s see how it stands up to a feather.

You don’t have to read many poll or survey results to be familiar with the phrase “totals may not equal 100% because of rounding.”

So guess what? The numbers in Strategic Visions’ results all add to exactly 100. Oops. That’s a huge red flag. Huge, like big enough to wrap a planet in.

Ok, I didn’t check all their polls. Just the most recent 73, which is how many I checked before stopping. For the record, I didn’t stop because poll #74 added to 99 or 101. That would have been statistical misconduct on my part. I didn’t look at poll #74 or any others, because I wanted to write this up. (If anyone checks further, let me know and I’ll post an update.)

If Strategic Visions were not lying (or being extremely sloppy in a systematic way, which is their only hope of explaining this—see below), the chance the 73 most recent polls would add to 100 is less than 1 in 2,000. (That assumes intentional rounding to minimize 99s and 101s. For any pre-determined rounding rule, however, we’re talking 1 in 10,000,000 or worse. Maybe a real pollster can fill me in on what’s industry practice among those who aren’t lying.)

One-in-two-thousand-ish stuff happens all the time, but believe me, I didn’t find this on a data dredging excursion. I noticed that the first few poll results I saw added to exactly 100, I formulated a plausible hypothesis based on all the evidence at hand, I carried out an experiment, and I calculated the p-value.* Small enough to be incriminating in my home court, and I tend to be a benefit-of-the-doubt guy.

Just two more things. First, the possibility of systematic sloppiness:

The consistent adding-to-100 could be the result of a systematic error, as opposed to cheating. Of the possible excuses, this is the one I suggest SV choose if they decide not to come clean. Logically it can’t be distinguished from lying, and they can attribute it to a whipping boy like the web site designer. (This excuse doesn’t defend against the mountain of mathematical bricks I mentioned earlier, however.) They can say that they made a regrettable decision that to avoid the appearance of error, they calculated one percentage in each survey from the other percentages, not from the survey results. I won’t believe it for a minute (though if they show me their programs or confirm that some commercial product makes this error, I’ll reconsider), but it might get them out of hot water.

And second and last, an example and a bit of the math behind my calculations:

Most survey results are short lists of whole number percentages that express fractions to the nearest whole percentage point. Suppose 600 likely voters were polled in a tight race between Tintin and Dora the Explorer. Tintin had the support of 287 people, Dora was close behind with 286, and the rest of the 600 people surveyed (that would be 27 of them) said they weren’t sure. To the nearest whole percentages, that’s 48% for Tintin, 48% for Dora, and 5% undecided. The sum of the rounded percentages is 101%, and that’s due to honest mathematics, not fraud.

Let me skip some really fun mathematics and tell you that for survey questions that have three answers, the percentages add to 100 most, but by no means all of the time. Exactly how often they don’t add to 100 depends on several factors, two of which matter much at all in this case: the number of people surveyed, and how numbers ending in .5 are rounded to whole numbers. Strategic Visions, LLC’s usual survey sample size is 800, and even if ending-in-.5 numbers are rounded differently in each poll to avoid a sum of 99 or 101 when possible, three-choice result percentages should still add to 99 or 101 at least one time out of ten.

* Not to be taken as evidence of a non-Bayesian persuasion on my part. The frequentist approach seemed to me pretty straightforward and justifiable here, that’s all.

5 Nov 2009 19:33

Frightening, but not for the obvious reason

Posted by Steve under News , Nonsense , Research Studies , Statistics , Vulpigeration
Comment on this post

Today’s clicking (especially from fivethirtyeight.com) led me to two strikingly similar declamatory reports about high school student’s knowledge of civics, complete with chart-laden survey results.

“Arizona schools are failing at [a] core academic mission,” concludes this Goldwater Institute policy brief.

“Oklahoma schools are failing at a core academic mission,” announces this Oklahoma Council of Public Affairs article.

When asked to name the first president of the United States, only 26.5% of the Arizona high school students surveyed answered correctly. Only 49.6% could correctly name the two major political parties in the United States. An even smaller percentage of Oklahoma high school students gave correct answers to these and other questions from the U.S. citizenship test study guide. None of the thousands of students surveyed in either state answered all ten questions correctly.

The shocking thing is that these are garbage studies. Made-up numbers, probably. The acme of vulpigeration. Evil. Makes me sick. (Glad I coined the word, though.)

No way these are real studies. Danny Tarlow over at This Number Crunching Life has taken a mathematical hammer to the Oklahoma “study” quite effectively. (The blatant similarity of the Arizona “study” blows away any shred of possibility that the Oklahoma study is legit. I’d love to see Danny’s face when he sees the Arizona report.)

What’s frightening is that this kind of snake oil has far too good a chance of surviving as fact (which it isn’t) and influencing public policy.

The guilty parties? The Goldwater Institute, which as you might guess is a conservative “think” tank. The OCPA, which describes itself as “the flagship of the conservative movement in Oklahoma.” Matthew Ladner, the author of both reports, who is vice president of research for the Goldwater Institute. And last but not least, Strategic Vision, LLC, which Ladner says “conducted” the studies. In my opinion, the word is concocted. Read about them yourself.

[Updated with correct business name: Strategic Vision, LLC.]

29 Oct 2009 16:41

Ba ba ba ba ba ba!

Posted by Steve under Music , News
Comment on this post

No, it’s not another Philip Glass premiere, but that was a good guess. The correct answer is The Kinks Choral Collection.

I’d love to see my friends come hear me in either or both Dessoff gigs I’m part of this month:

November 12: Our 2009-2010 concert series begins, with Ernest Bloch’s Avodath Hakodesh (Sacred Service) at Congregation Rodeph Sholom on the Upper West Side.
November 19 and 20: We back up Ray Davies at Town Hall on November 19 and 20.

If you can’t make it to one of our shows, you’re not off the hook:

November 18th: Dessoff appears on The Late Show with David Letterman. A small group of Dessoff singers (not including me) will back up Ray Davies, who’ll be Dave’s guest that day.

28 Oct 2009 22:56

Blah Blah F–k Blah Blah Math Prof.

Posted by Steve under News , Statistics
1 Comment

Kevin Underhill* emailed me today about the Schwarzenegger veto letter. Specifically, Kevin wondered whether “it might be possible to calculate the odds against this arrangement of letters being entirely random, as the Governor’s office has claimed.”

Kevin wasn’t the only one to wonder about the odds and contact a math professor.

Several hours after Kevin wrote me, SFweekly.com published “Odds Schwarzenegger’s ‘I F–k You’ Message Was Coincidental? About One in Two Billion, Says Math Prof.” SFweekly.com quoted Stephen Devlin, a mathematician at University of San Francisco, and Gregory McColm, a mathematician at the University of South Florida, and printed various tiny chances, including a) 1 in 10 million, b) 1 in 100 million, and c) 1 in 2 billion. These are (using rough estimates of initial-letter frequencies in English words) the chances a) that seven randomly selected lines of English text begin with f, u, c, k, y, o, and u, in that order, b) that eight randomly selected lines of English text begin with i, f, u, c, k, y, o, and u, in that order, and c) that the first letters of eight randomly selected lines of English text and two blank lines separating those lines into three groups appear in the sequence i, blank, f, u, c, k, blank, y, o, u. The chances of c) are 1 in 2.1 billion, by my calculation, but no matter—only the order of magnitude is of interest here.

Whether many visitors to SFweekly.com (well, male ones of any persuasion) read the piece is questionable. It appeared next to two photo links with more click appeal than “Blah Blah F–k Blah Blah Math Prof.”: Exotic Erotic Ball (image of exotically costumed ladies kissing) and San Franciso’s Hottest Chefs (image of boyish chef wearing five o’clock shadow and a uniform, smiling seductively).

Similar calculations appear with less distraction elsewhere.

For the record, even before making any calculations, I was convinced beyond a reasonable doubt that the governor meant full well to say “fuck you.” In a later blog post, however, I’ll try to explain that the unlikelihood of “fuck you” or “i fuck you” appearing in a letter isn’t in and of itself a smoking gun.

*I’ve e-known Kevin for a couple of months now, ever since I introduced myself with an unceremonious, if politely worded email to the effect of “your arithmetic is wrong.” More details here. It turns out to be a better way to meet people that you might think.

Kevin finds fortune at Shook, Hardy, and Bacon (shb.com, Alexa US traffic rank 292,302, with 132 sites linking in), and he finds fame at Lowering the Bar (www.loweringthebar.net, Alexa US traffic rank 59,588, with 145 sites linking in).

One Response to “Blah Blah F–k Blah Blah Math Prof.”

Dave Costa Says:
October 29th, 2009 at 8:25 am
Hi Steve!

Meanwhile, NPR’s All Things Considered did essentially the same thing. For some reason they chose a cryptologist (why?) from Goucher College (what?). His calculation of the odds was 5.5 in 1 TRILLION.

http://www.npr.org/templates/story/story.php?storyId=114253863

20 Oct 2009 17:30

Using flashcards is better than just reading them

Posted by Steve under News , Research Studies , Teaching
1 Comment

The following subheadline on the Scientific American website caught my eye today (and not only because of the missing period):

New research makes the case for hard tests, and suggests an unusual technique that anyone can use to learn

I may be a bit thick, because neither the article nor the research paper it mentioned suggested any unusual technique to me. But this was better than my last wild goose chase reading episode, when I vainly sought a footnote on a cereal box (there was a dagger: †, but no footnote. Can you believe that?).

Henry Roediger and Bridgid Finn, the Scientific American article’s authors, write that researchers Kornell, Hays, and Bjork found that “learning becomes better if conditions are arranged so that students make errors.” There’s that pesky word “better.” Better than what? The eternal unanswered question. My guess is that Scientific American is reporting that Kornell et al. have found that learning under a) conditions arranged so that students make errors is better than learning under b) conditions arranged so that students do not make errors. In other words, that the researchers found errorful learning to be better than errorless learning. Not that it’s a bad article, but it would be nice if Roediger and Finn had stated what they’re reporting a bit more clearly. (This is why I give writing assignments to my statistics students. By the end of the semester, they better learn not to use adjectives like better without answering “Better than what?”.)

Anyway, Kornell et al. do mention errorless learning in their paper, recently published in the Journal of Experimental Psychology: Learning, Memory and Cognition® (yes, the name of the journal is a registered trademark), but they don’t study it. The abstract notes that they examine the question of “what happens when one cannot answer a test question—does an unsuccessful retrieval attempt impede future learning or enhance it?” Kornell et al. didn’t exactly examine this question either, because they didn’t (and possibly couldn’t) isolate what part of the learning in their scenario was “future” learning. In addition, they only studied learning after wrong answers, so one must be careful not to assume their research sheds light on getting test questions wrong vs. getting them right. (Suppose a researcher reported that “Student learning among African-Americans is enhanced when they are given test questions they cannot answer.” If the researcher only studied African-Americans and made no comparison to other populations, the reported finding might easily be misinterpreted.)

What Kornell et al. did was compare two scenarios for learning previously unknown information. One scenario was unsuccessful retrieval attempts (the students were asked to provide the not-yet-learned information as answers to test questions, and they answered incorrectly). In this scenario, the retrieval attempt was followed by feedback that included a brief presentation of the new information (i.e., the correct test question answer). The second scenario was a longer-lasting presentation of the new information with no retrieval attempt (the students were not asked to answer a test question, and it’s unclear in some of the experiments whether the students knew what kind of test question they would later be asked). Not surprisingly, unsuccessful retrieval attempts enhanced learning (as measured by scores on a test containing questions like those in the retrieval attempt), when compared to presentation of new information with no retrieval attempts. Despite the Scientific American article’s subheadline, this research makes no case that “hard tests” are better for learning than non-hard tests. They may be, but this research doesn’t help us figure it out. The research does support the value of tests, hard or not-hard, so long as there’s feedback with the right answer.

One Response to “Using flashcards is better than just reading them”

Andrew Willett Says:
October 21st, 2009 at 10:01 am
That’s what happens when you let Björk work on your research project. Half your research budget gets spent on crazy outfits and acts of visionary musical weirdness, which means you have to cut corners elsewhere.

† (That would have driven me crazy as well.)

29 Sep 2009 20:44

N66534

Posted by Steve under Family , News
Comment on this post

Mom, in front of a Resort Airlines C-46 Commando, probably on her honeymoon in 1950, but surely before September 28, 1953. The plane in the photo crashed at Louisville airport that day and was subsequently written off.

11 Sep 2009 0:59

Throw Nanny Under the Bus

Posted by Steve under News
Comment on this post

Despite the vicious insinuations, Obama and his supporters aren’t cruel and thoughtless—they aren’t hiding a “throw Granny under the bus” clause in the health care reform bill. Keeping Granny alive is one of the few things Americans agree on for now. Obama unequivocally refutes the insinuation. At the same time, Obama staunchly defends against a loud challenge to his plan to throw nannies under the bus instead. Nannies, gardeners, food handlers, farm workers, janitors, if they’re undocumented immigrants, throw them all under the bus. No matter how many millions of them there are, no matter how hard they work for how little pay and benefits they get, no matter that they pay taxes, contribute to the local economy, no matter anything. They aren’t Americans. Throw them all under the bus.

The undocumented of America’s hard-working, poorest laborers, harvesting, packing, preparing, and serving our and our children’s food; caretakers responsible to our children and our homes; gardeners keeping our grounds beautiful. Them. Throw them under the bus. Don’t give them access to affordable health care.

America’s undocumented workers and their families deserve health care just like everyone living in this country. Grievously, no one, no one sees the value or the ethical imperative, or the public health benefits, to treat the undocumented workers fairly.

6 Sep 2009 18:07

The Hounding of Van Jones, or Mike Pence is an Asshole

Posted by Steve under News
Comment on this post

Quoting Representative Mike Pence on Van Jones:

His extremist views and coarse rhetoric have no place in this administration or the public debate.

Mike, whose extremist views and coarse rhetoric do have a place in the public debate?

Pence is the House sponsor of the “Broadcaster Freedom Act,” which would prohibit the FCC from repromulgating the Fairness Doctrine. Not that the FCC is likely to do so any time soon, nor that they necessarily should, but Pence must really, really want no one to overrule him and his supporters about what should be kept out of the public debate.

Speaking about Pence’s supporters, click here to read Rebel Reports’ Jeremy Scahill on Pence, Blackwater/Xe, and other stuff.

This concludes my contribution to the public debate for today. Thank you for listening.

« Previous Page — Next Page »

News

Localization (probably) strikes again

One Response to “Localization (probably) strikes again”

Leave a Reply

9/11 pager intercepts on Wikileaks

Leave a Reply

Totals equal 100% because of lying

Leave a Reply

Frightening, but not for the obvious reason

Leave a Reply

Ba ba ba ba ba ba!

Leave a Reply

Blah Blah F–k Blah Blah Math Prof.

One Response to “Blah Blah F–k Blah Blah Math Prof.”

Leave a Reply

Using flashcards is better than just reading them

One Response to “Using flashcards is better than just reading them”

Leave a Reply

N66534

Leave a Reply

Throw Nanny Under the Bus

Leave a Reply

The Hounding of Van Jones, or Mike Pence is an Asshole

Leave a Reply

Categories

Monthly