## Message Posted: Mon, 08 Jul 2002 @ 18:50:15 GMT

 < Last>>

 Subj: Re: Uniform Distributions from RANDOM number function From: Geoffrey Rommel

Doug has raised some interesting points here.

First, if you would like an overview of how "random number" algorithms work, along with plenty of warnings about the possible pitfalls, I recommend Chapter 3 of Donald Knuth's Art of Computer Programming, Vol. 2 (3rd edition, Addison-Wesley, 1998). There's a lot of higher math that I don't understand, but the main concepts are clear enough.

 The documentation seems to imply that a different seed to the algorithm is used for each call which should produce another totally random number sequence independent of any prior sequences. However, this does not seem to be the case.

Yes, a new seed is used each time, but you have no control over it. The algorithm (lrand48, I believe) is described in the SQL Reference, Vol. 5 (V2R4.1, B035-1101-061A, June 2001), p. 7-43. Now, is the sequence "independent" of the prior sequences? Strictly speaking, no: the algorithm is deterministic; it's just designed to produce results that look random. Generally speaking, though, each sequence should look different enough from previous ones that a casual observer would never notice the pattern.

Since you are trying to produce unique values, I guess you are no longer a casual observer!

 The trouble arises when I attempt to split this process into multiple passes inserting a batch of rows (100K) at a time. On the second or subsequent pass of the same SQL I begin to see some number ranges that have a high rate of duplication in each pass.... I am looking for ideas on how to work around this issue so that I can generate a set of random numbers without duplicates with some level of confidence.

In the words of the Wizard, you are a victim of disorganized thinking! If you pick two separate random sequences, uh, at random, they may very well overlap. For instance, let's say you generate two randomly selected groups of 10 integers from 0 to 99. There is a chance that they will contain one or more identical integers. (I'm not savvy enough to compute what that chance is, but it's greater than 0.) The same applies when you generate two groups of 100K random numbers from a universe of 2**31, although obviously the chances are smaller. And, of course, the algorithm isn't really random, so perhaps the chances are greater than they would be if the numbers were truly random. Demoralizing, isn't it?

I believe the only way to accomplish what you wish is to generate a single set of random numbers and then somehow insert them 100,000 at a time into your final table.

Hope this helps,

wgr

 < Last>>

Attachments

Library

Quick Reference

Archives

Sample Index

 2016 2007 2015 2006 2014 2005 2013 2004 2012 2003 2011 2002 2010 2001 2009 2000 2008 1999

2002 Indexes

 Jan Jul Feb Aug Mar Sep Apr Oct May Nov Jun Dec

 Top Home Privacy Feedback