Re: Uniform Distributions from RANDOM number function [From Doug Drake: Tue, 09 Jul 2002 @ 14:02 GMT]

Message Posted: Tue, 09 Jul 2002 @ 14:02:08 GMT

Subj:

Re: Uniform Distributions from RANDOM number function

From:

Doug Drake

John,

No, the purpose of the random number is simply to generate a unique number that can be used as the surrogate key in place of the multi- column natural key.

Your comments strike at the core of the issue. The random number approach is being considered due to the complexities and performance limitations of generating a large volume of sequential numbers in parallel. Yes, one could use and maintain a separate sequential number range for each VAMP (or node) but this introduces additional maintenance of the process should the configuration change which might be acceptable but is not desirable.

The sequential numbering approach may be the only viable alternative that enables us to generate a unique single column key across billions of rows. Maybe someone has some clever ways to do this to maximize parallelism (and hence throughput) and minimize maintenance issues.

Just to put some scope around the problem we are looking at inserting up to 100 million rows each day into a table potentially containing billions of rows. The random or sequential number is to be used as a surrogate key in place of the multi-column key. The surrogate key must be unique just as all of the columns of the natural key would guarantee uniqueness. This maintenance process is assumed to complete within a couple of hours each day. These are the base requirements.

We have looked at variations of hashing the columns of the natural key which generates duplicates. The next logical approach involved the use of a sequential numbering scheme. It seems the only way to obtain acceptable performance would be to split up the process into multiple parallel streams each generating their own range of sequential numbers. Not only would this introduce additional complexity but also it would preclude the use of Fastload for the initial loading process.

A Fastload Inmod routine to generate sequential numbers may be an alternative to consider here. I just don't have any metrics of the performance degradation of using a simple Inmod with Fastload. Maybe someone has some ideas on the performance hit we might expect.

Any other ideas would be appreciated.

Regards,

Doug Drake

Attachments

Library

Quick Reference

Archives of the TeradataForum

Message Posted: Tue, 09 Jul 2002 @ 14:02:08 GMT