|
Archives of the TeradataForumMessage Posted: Thu, 26 Jun 2003 @ 18:28:50 GMT
Actually, 50K hash values from 3m rows in a call detail table does not surprise me. If you look at the columns, I think we can assume that the date types are byteint, date, int, and a char or large decimal. I would bet most of the domain distribution is concentrated in the last word of the integer. The ACCESS_METHOD_ID is likely a small number of values - 10 maybe, and I'll bet that 95% of them are the same value. The CALL_START_DT, being an underlying integer in a call detail table, is going to also be very poorly distributed over the domain of values - likely only several hundred values. The CALL_START_TM could have fair distribution, but still, all the change values would be concentrated in the last word of the numeric value. Phone numbers such as CALL_ORIGINATING_NUM actually make very poor distributions, because they are typically concentrated in a small number of NPA/NXX values according to the geography in which the carrier typically operates - even when it's the originating number (because the same people call the same people...) In all, my guess is that 95% of the change in values, occurs in the same two bytes when you stack and fold the values. Try: 1) flipping the values of your ACCESS_METHOD_ID to the other end of the domain or distributing them across the domain like a bitmask. For example, if the values are 1 to 10, change them to corresponding bit value locations 1 to 10: 1,2,4,8,16... 2) change CALL_START_TM to a decimal. 3) invert the change space of the phone number - something like reorder it with the right most digit to the left, or rotate them. Michael McIntire
| ||||||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||
Copyright 2016 - All Rights Reserved | ||||||||||||||||||||||||||||||||||||||||||||||||
Last Modified: 15 Jun 2023 | ||||||||||||||||||||||||||||||||||||||||||||||||