|  |  | Archives of the TeradataForumMessage Posted: Thu, 21 Aug 2003 @ 07:30:37 GMT
 
 
  
| Subj: |  | Re: Deleting dupes from multiset tables without rowid |  |  |  | From: |  | Dieter Noeth |  
 Christian Schiefer wrote: |  | Since V2.0.0.24 there is a new internal parameter  "Rowid_enabled", which is FALSE by default. |  | 
 
 
 Interesting...  Probably because of the 10 Bytes compared to 8 Bytes < V2R5? 
 |  | This means:  "select rowid, t01.* from t01" does not work anymore.  The construction of a join index using rowid is still
possible and will work. |  | 
 
 
 |  | If you set "Rowid_enabled" to TRUE, you can still "select rowid ..." |  | 
 
 
 So i would enable it ;-) 
 |  | NCR plans are not to allow a "select rowid" from V2R5.1 and upwards for compatibilty reasons ..... One should use the new
identity column, which is introduced in V2R5. |  | 
 
 
 |  | Many large tables are Multiset (performance reasons, etc ... ). Alter table to add an identity column is a.) not possible and b.)
would suck for performance reasons anyway. |  | 
 
 
 Why, i'd expect it no to be much slower than adding a new NOT NULL column. 
 |  | I think, the only way to add an identity column is an "insert into select *", which may not possible because of lack of free
diskspace. |  | 
 
 
 |  | With rowid not possible anymore and no identity column, the only way to get rid of unwanted dupes would be: |  | 
 
 
 |  | 1.) create table large_table_dupe as identity_col, *.large_table. |  | 
 
 
 |  | 2.) Get the dupe records into large_table_dupe: |  | 
 
 
 |  | insert into large_table_dupe select identity_col
 , all_cols from (
 select all_cols
 count(*)
 from large_table group by all_cols having count(*) > 1 );
 |  | 
 
 
 |  | 3.) delete all dupes in large_table_dupe: |  | 
 
 
 |  | delete from  large_table_dupe where identity_col not in (
 select max_id from
 ( select max(identity_col) max_id
 , all_cols
 from large_table_dupe
 group by all_cols ) t01 );
 |  | 
 
 
 ??  There's only one cope per row in that table (Hint: you used group by) And it would matter if there were dupes for the next
step... 
 |  | 4.) delete all dupes from large_table: |  | 
 
 
 |  | delete from large_table where all_cols in ( select all_cols from large_table_dupe );
 |  | 
 
 
 |  | 5.) Get some rows in again: |  | 
 
 
 |  | insert into large_table select all_rows from large_table_dupe; |  | 
 
 
 |  | My question: What do you think of it?  This solution is not very straight forward. |  | 
 
 
 Why don't you use: 
create newtable as oldtable with no data;
insert into newtable
   select al_cols from oldtable
   group by all_cols
   having count(*) > 1
;
delete from oldtable
where (all_cols) in
   (select all_cols from newtable);
insert into oldtable
   sel * from newtable;
 
 If there's only a small number of dupes, this is quite fast. And if there's enough space availabe: 
create newtable as oldtable with no data;
insert into newtable
   select distinct * from oldtable;
drop table oldtable;
rename newtable to oldtable;
 
 Dieter 
 
 |  |