Archives of the TeradataForum

Message Posted: Thu, 21 Aug 2003 @ 07:30:37 GMT


	<Prev	Next>		<<First	<Prev	Next>	Last>>

Subj:		Re: Deleting dupes from multiset tables without rowid

From:		Dieter Noeth

Christian Schiefer wrote:

Since V2.0.0.24 there is a new internal parameter "Rowid_enabled", which is FALSE by default.

Interesting... Probably because of the 10 Bytes compared to 8 Bytes < V2R5?

This means: "select rowid, t01.* from t01" does not work anymore. The construction of a join index using rowid is still possible and will work.

If you set "Rowid_enabled" to TRUE, you can still "select rowid ..."

So i would enable it ;-)

NCR plans are not to allow a "select rowid" from V2R5.1 and upwards for compatibilty reasons ..... One should use the new identity column, which is introduced in V2R5.

Many large tables are Multiset (performance reasons, etc ... ). Alter table to add an identity column is a.) not possible and b.) would suck for performance reasons anyway.

Why, i'd expect it no to be much slower than adding a new NOT NULL column.

I think, the only way to add an identity column is an "insert into select *", which may not possible because of lack of free diskspace.

With rowid not possible anymore and no identity column, the only way to get rid of unwanted dupes would be:

1.) create table large_table_dupe as identity_col, *.large_table.

2.) Get the dupe records into large_table_dupe:

insert into large_table_dupe
select identity_col
, all_cols from (
select all_cols
count(*)
from large_table group by all_cols having count(*) > 1 );

3.) delete all dupes in large_table_dupe:

delete from large_table_dupe
where identity_col not in (
select max_id from
( select max(identity_col) max_id
, all_cols
from large_table_dupe
group by all_cols ) t01 );

?? There's only one cope per row in that table (Hint: you used group by) And it would matter if there were dupes for the next step...

4.) delete all dupes from large_table:

delete from large_table
where all_cols in ( select all_cols from large_table_dupe );

5.) Get some rows in again:

insert into large_table select all_rows from large_table_dupe;

My question: What do you think of it? This solution is not very straight forward.

Why don't you use:

create newtable as oldtable with no data;

insert into newtable
   select al_cols from oldtable
   group by all_cols
   having count(*) > 1
;

delete from oldtable
where (all_cols) in
   (select all_cols from newtable);

insert into oldtable
   sel * from newtable;

If there's only a small number of dupes, this is quite fast.

And if there's enough space availabe:

create newtable as oldtable with no data;

insert into newtable
   select distinct * from oldtable;

drop table oldtable;

rename newtable to oldtable;

Dieter


	<Prev	Next>		<<First	<Prev	Next>	Last>>

Attachments

Library

Quick Reference