|
|
Archives of the TeradataForum
Message Posted: Thu, 21 Aug 2003 @ 07:30:37 GMT
Subj: | | Re: Deleting dupes from multiset tables without rowid |
|
From: | | Dieter Noeth |
Christian Schiefer wrote:
| Since V2.0.0.24 there is a new internal parameter "Rowid_enabled", which is FALSE by default. | |
Interesting... Probably because of the 10 Bytes compared to 8 Bytes < V2R5?
| This means: "select rowid, t01.* from t01" does not work anymore. The construction of a join index using rowid is still
possible and will work. | |
| If you set "Rowid_enabled" to TRUE, you can still "select rowid ..." | |
So i would enable it ;-)
| NCR plans are not to allow a "select rowid" from V2R5.1 and upwards for compatibilty reasons ..... One should use the new
identity column, which is introduced in V2R5. | |
| Many large tables are Multiset (performance reasons, etc ... ). Alter table to add an identity column is a.) not possible and b.)
would suck for performance reasons anyway. | |
Why, i'd expect it no to be much slower than adding a new NOT NULL column.
| I think, the only way to add an identity column is an "insert into select *", which may not possible because of lack of free
diskspace. | |
| With rowid not possible anymore and no identity column, the only way to get rid of unwanted dupes would be: | |
| 1.) create table large_table_dupe as identity_col, *.large_table. | |
| 2.) Get the dupe records into large_table_dupe: | |
| insert into large_table_dupe
select identity_col
, all_cols from (
select all_cols
count(*)
from large_table group by all_cols having count(*) > 1 ); | |
| 3.) delete all dupes in large_table_dupe: | |
| delete from large_table_dupe
where identity_col not in (
select max_id from
( select max(identity_col) max_id
, all_cols
from large_table_dupe
group by all_cols ) t01 ); | |
?? There's only one cope per row in that table (Hint: you used group by) And it would matter if there were dupes for the next
step...
| 4.) delete all dupes from large_table: | |
| delete from large_table
where all_cols in ( select all_cols from large_table_dupe ); | |
| 5.) Get some rows in again: | |
| insert into large_table select all_rows from large_table_dupe; | |
| My question: What do you think of it? This solution is not very straight forward. | |
Why don't you use:
create newtable as oldtable with no data;
insert into newtable
select al_cols from oldtable
group by all_cols
having count(*) > 1
;
delete from oldtable
where (all_cols) in
(select all_cols from newtable);
insert into oldtable
sel * from newtable;
If there's only a small number of dupes, this is quite fast.
And if there's enough space availabe:
create newtable as oldtable with no data;
insert into newtable
select distinct * from oldtable;
drop table oldtable;
rename newtable to oldtable;
Dieter
| |