9 Aug 2006 0:27
A Million Random Digits with 100,000 Normal Deviates
Posted by Steve under SQL ServerComment on this post
Groundbreaking when it was published in 1955, the classic book “A Million Random Digits with 100,000 Normal Deviates” has been republished electronically by the RAND corporation with permission “to duplicate this electronic document for personal use only, as long as it is unaltered and complete.” Books like these were a staple of statistical research in the mid-20th century, and this particular one was highly revered.
Nowadays, there are better sources of random numbers, such as HotBits, and there are many ways to generate pseudorandom numbers, which are not random, but have many of the properties of random number and are useful for many purposes.
I hope it’s not a violation of the copyright for me to provide instructions on how to use SQL to load the book’s content in its published format (or any identically-formatted list) into a SQL table that can be queried for random (not pseudorandom) sequences of numbers. The script uses a few of SQL Server 2005’s new features, including the BULK rowset provider for text files, some of the new analytic functions, and TOP with a variable. You’ll also need a table-valued function called Numbers(), like the one in my previous SQL post.
The RAND book is available here, and my script works for the support file “Datafile: A Million Random Digits,” available for download here. The SQL Server 2005 script below assumes you’ve downloaded this file and unzipped it to C:\\RAND\\MillionDigits.txt.
The beginning of the file looks like this
00000 10097 32533 76520 13586 34673 54876 80959 09117 39292 74945
00001 37542 04805 64894 74296 24805 24037 20636 10402 00822 91665
00002 08422 68953 19645 09303 23209 02560 15953 34764 35080 33606
00003 99019 02529 09376 70715 38311 31165 88676 74397 04436 27659
00004 12807 99970 80157 36147 64032 36653 98951 16877 12171 76833
00005 66065 74717 34072 76850 36697 36170 65813 39885 11199 29170
00006 31060 10805 45571 82406 35303 42614 86799 07439 23403 09732
00007 85269 77602 02051 65692 68665 74818 73053 85247 18623 88579
00008 63573 32135 05325 47048 90553 57548 28468 28709 83491 25624
00009 73796 45753 03529 64778 35808 34282 60935 20344 35273 88435
Unix-style newlines (0x0A) are used, and the million digits are organized into 20,000 five-digit integers with leading zeroes, so the script will import the file into a table of 20,000 five-digit numbers (as char(5) data with leading zeroes). Here’s the script:
create database MillionDigits
go
use MillionDigits
go
create table MillionDigitsFile (
c varchar(max)
)
go
insert into MillionDigitsFile
select BulkColumn
from openrowset(bulk 'C:\\RAND\\MillionDigits.txt\\', SINGLE_CLOB) as D
go
create table NumbersFromTable(
position int primary key,
number char(5) not null
)
create index NumbersFromTable_number on NumbersFromTable(number)
go
-- The first of the five groups of two numbers each
-- begins at position 9 of each line. Each of the other
-- four groups on a line begins 13 characters after the
-- previous one. The second number in each group
-- begins 6 characters after the first.
insert into NumbersFromTable
select
row_number() over (order by N.n,A.n,B.n) as rk,
substring(c,9+72*N.n+13*A.n+6*B.n,5) as n
from
Numbers(0,19999) as N,
Numbers(0,4) as A,
Numbers(0,1) as B,
MillionDigitsFile
go
-- How random does it look? (and a sneaky way to
-- aggregate over an aggregate)
select top 1
min(count(*)) over (),
max(count(*)) over (),
avg(1.00000*count(*)) over (),
stdev(count(*)) over ()
from NumbersFromTable
group by number
go
/* Selects a @length-long sequence of numbers from
the table, where the place to start is found as
follows. Given a random integer, use % to turn
it into a number's position between 1 and 200000.
Reduce that position % 20000 to find a starting
line of the book, and reduce the following
number % 10 to find the starting number on
that line.
*/
create function RandomSequence(
@seed int,
@length int
) returns table as return (
select top (@length)
row_number() over (order by position) as i,
number
from NumbersFromTable
where position >= (
select number%20000
from NumbersFromTable
where 1+@seed%200000 = position
) + (
select number%10
from NumbersFromTable
where 1+(@seed+1)%200000 = position
)
order by position
)
go
-- Generate a few random sequences. You'll get different ones
-- each time you run this.
declare @seed int
set @seed = abs(binary_checksum(newid()))%200000
select * from RandomSequence(@seed,50)
set @seed = abs(binary_checksum(newid()))%200000
select * from RandomSequence(@seed,123)
-- Uncomment to clean up
-- use master
-- go
-- drop database MillionDigits