Archives of the TeradataForum

Message Posted: Thu, 26 Jun 2003 @ 18:28:50 GMT


	<Prev	Next>		<<First	<Prev	Next>	Last>>

Subj:		Re: Question about HASHROW

From:		Michael McIntire

Actually, 50K hash values from 3m rows in a call detail table does not surprise me.

If you look at the columns, I think we can assume that the date types are byteint, date, int, and a char or large decimal. I would bet most of the domain distribution is concentrated in the last word of the integer.

The ACCESS_METHOD_ID is likely a small number of values - 10 maybe, and I'll bet that 95% of them are the same value.

The CALL_START_DT, being an underlying integer in a call detail table, is going to also be very poorly distributed over the domain of values - likely only several hundred values.

The CALL_START_TM could have fair distribution, but still, all the change values would be concentrated in the last word of the numeric value.

Phone numbers such as CALL_ORIGINATING_NUM actually make very poor distributions, because they are typically concentrated in a small number of NPA/NXX values according to the geography in which the carrier typically operates - even when it's the originating number (because the same people call the same people...)

In all, my guess is that 95% of the change in values, occurs in the same two bytes when you stack and fold the values.

Try:

1) flipping the values of your ACCESS_METHOD_ID to the other end of the domain or distributing them across the domain like a bitmask. For example, if the values are 1 to 10, change them to corresponding bit value locations 1 to 10: 1,2,4,8,16...

2) change CALL_START_TM to a decimal.

3) invert the change space of the phone number - something like reorder it with the right most digit to the left, or rotate them.

Michael McIntire
Certified Master


	<Prev	Next>		<<First	<Prev	Next>	Last>>

Attachments

Library

Quick Reference