Archives of the TeradataForum

Message Posted: Fri, 13 Aug 2010 @ 16:25:57 GMT


	<Prev	Next>		<<First	<Prev	Next>	Last>>

Subj:		Re: No Primary Index (NoPI) Table in TD 13

From:		DWellman

Hi Naveen,

It sounds like you've got some very strange data !

Using NOPI, if you load the data from an external file then the data rows will be "round robin'd" across the AMPs in the config. With 300k rows then that should not give you a distribution problem. As someone else has said, if you're doing an insert/select from an existing table then different rules come into play.

However, if you've got 20-30 columns in the table def'n (like the one that you showed before) then I doubt that you need ot go that far. Unless (as I said before) you've got some very strange data (all/mainly nulls or zero's?) I would expect to find a number of combinations of column(s) which provide good distribution. Obviously, the final choice of which of those to then use will be dependent on normal PI choice (joining, access, dup count etc).

If you've got a table that is loaded on your Teradata system, then you can model the distribution of any chosen PI without having to actually build the table. Do this using the hash functions as shown below:

Example 1
---------

Choose two columns from a table.

     SELECT (100.00 * AVG(rowcount) / MAX(rowcount)) AS ParEff
           ,AVG(RowCount)
           ,MAX(RowCount)
     FROM (SEL HASHAMP(HASHBUCKET(HASHROW(thedate,thetime))) AS AmpNo
              ,CAST(COUNT(*) AS FLOAT) AS RowCount
           FROM dbc.resusagespma
           GROUP BY 1
          ) AS dt1;


     ParEff    Average(RowCount)    Maximum(RowCount)
     82.62        135.50               164.00

Example 2
---------

Choose a single column from a table.

     SELECT (100.00 * AVG(rowcount) / MAX(rowcount)) AS ParEff
           ,AVG(RowCount)
           ,MAX(RowCount)
     FROM (SEL HASHAMP(HASHBUCKET(HASHROW(thedate))) AS AmpNo
              ,CAST(COUNT(*) AS FLOAT) AS RowCount
           FROM dbc.resusagespma
           GROUP BY 1
          ) AS dt1


     ParEff    Average(RowCount)    Maximum(RowCount)
     51.36        295.82               576.00

Note that the only things that you have to change from my examples above are the name of your table (dbc.resusagespma in the examples) and the column name(s) - either "thedate" or "thedate,thetime" in my examples.

Note also that this query reports "ParEff" (Parallel Efficiency) rather than skew. For Parallel Efficiency you want higher values (ideally 100%, but certainly 95%+).

Run those queries a few times, chosing different combinations of columns. See what that tells you.

Cheers,

Dave

Ward Analytics Ltd - Information in motion ( www.ward-analytics.com )


	<Prev	Next>		<<First	<Prev	Next>	Last>>

Attachments

Library

Quick Reference