Re: Count of Distinct Rows [From Grenwelge, Bill A.: Wed, 05 Nov 2003 @ 17:30 GMT]

Message Posted: Wed, 05 Nov 2003 @ 17:30:01 GMT

Subj:

Re: Count of Distinct Rows

From:

Grenwelge, Bill A.

for a table with 3.5 million rows, why would you have a primary index that has that many nulls in it? Seems to me that with that many nulls, the table would be a bad performer for joins anyway. Basically what that table is showing is that a null value exists in the primary index for every row in the table. It would seem to me this would qualify as a candidate for a new primary index, just basing off the fact of nulls in the PI. This very issue is probably why there are warnings about using nulls in the primary index.

After looking at the issue a little closer, I would have to agree with Dieter that this seems to be working as designed (the unique values piece anyway).

Here is my take on what is happening (by no means correct but just an observation on my part)

Nulls seem to override any value they are associated with. While the hash algorithm is showing to hash the values to different locations in the system, it is probably doing so based on the values that it does have in the index. However, when the statistics are being collected the null value is overriding the rest of the values almost like it is concatenating the values in the PI to see what values are unique (very loose anology here). Nulls when used in a concatenation will make the entire row null. In the case of these statistics, the null in the index seems to be making that particular index row look like a null to the process that checks the uniqueness of the index. Since every row has a null (or nulls) in the PI, the statistics collection process thinks that every row is a Null and thus there is only one unique value. Again, I am basing this on what I see happening and from a few tests I did. If you take the same test that Ruth did, and modify it so that only one or two of the rows have nulls, you see the number of unique rows change to match the number of rows that have no nulls + 1 row representing all the rows that have a null in the PI.

Insert into dwwork.Null_Check Values (1,1,null); Insert into dwwork.Null_Check Values (1,2,3); Insert into dwwork.Null_Check Values (1,3,4); Insert into dwwork.Null_Check Values (1,4,null); Insert into dwwork.Null_Check Values (1,5,6); Collect Stats on dwwork.Null_Check Index (ColA, ColB, ColC); Help stats dwwork.Null_Check; Date Time Unique Values Column Names 03/11/05 11:11:23 4 cola,colb,colc

So based on what I am seeing here, the process is working as designed based on how Teradata treats null values in the other areas of proessing. Now, is that processing right or wrong, I have no clue. Besides, my observations could be totally off base. Can any one from NCR shed better light on how this might work?

Bill Grenwelge
Data Warehouse
Union Planters Bank

Attachments

Library

Quick Reference

Archives of the TeradataForum

Message Posted: Wed, 05 Nov 2003 @ 17:30:01 GMT