Archives of the TeradataForum
Message Posted: Tue, 30 Oct 2007 @ 12:02:16 GMT
| Subj: || || Re: Generate unique id |
| From: || || Dieter Noeth |
Soumya Smita wrote:
| ||I have a table with fields acct_id, fd1, fd2, fd3.|| |
| ||I want a unique id generated for each unique combination of fd1, fd2, fd3. Acct_id is not considered for generating the unique id. I have
millions of rows in my table, hence when I was using sum() over ... , I was getting spool space error.(May be due to sorting on the huge no. of
rows involved.)|| |
It's sorting plus 2 spools for 1 OLAP step, you can't get rid of that as long as using OLAP.
| ||Plz let me know the best approach to get a unique_id , need not be based on any sorting criterion, considering that there are
millions of rows in the table.|| |
| ||Desired unique_id's somethng like this:|| |
> acct_id fd1 fd2 fd3 uniq_id
> 100 a b c 1
> 200 a b c 1
> 300 a b b 2
> 400 a a b 3
It's not what i call unique, it's a DENSE_RANK :-) So you can't use ROW_NUMBER.
| ||Just that uniq_id should not take acct_id in consideration... and query should avoid sort on the rows.(Neways there is no desired
sequence for the unique_ids)|| |
No sequence without ordering in Teradata...
If uniq_id must be ascending without gaps, then IDENTITY can't be used, if not simply use RANK.
This is probably the approach you described on Sep. 27 in "Check for existing attributes in a table". You could try too reduce the intermediate
size by using NOT EXISTS to filter already existing rows. If there are lots of rows with more than one row per uniq_id you could calculate a RANK
on a distinct set and then join back to that table. If the PI of that stage table is on (fd1, fd2, fd3) this would be quite fast:
from stage_table s
(select fd1, fd2, fd3, rank() over (order by ...) as rnk
from stage_table group by 1,2,3) dt
on s.fd1 = dt.fd and ...
Of course the easiest way is to check if there's a skewed spool because of a bad ordering and correct that, if not increase spool for
that load user.