Archives of the TeradataForum

Message Posted: Fri, 16 Sep 2005 @ 14:12:50 GMT


	<Prev	Next>		<<First	<Prev	Next>	Last>>

Subj:		Re: Compare rows of tableA to rows of tableB - HOW?

From:		Dieter Noeth

Al MacGowan wrote:

Good Morning Everyone..

It's 3:50 PM here :-)

I have a question that may not have the answer I'm looking for. The age-old question - I have TableA and TableB, they have an identical set of columns (C1, C2, C3). I want to identify the rows in TableA that are different than the rows in TableB.

I had an experienced Teradata programmer once tell me "oh that's easy, i'll just join the tables and do one-row minus the other row, and the changes will just come through the query". What does that mean? Has anyone else heard of such a concept?

A Full Outer Join will do the trick - as long as all columns are NOT NULL, else it's more complicated:

     drop table test_old;
     drop table test_new;

     create table test_old(
        pk int not null primary key,
        c1 char(5) not null,
        c2 char(5) not null
     );

     create table test_new(
        pk int not null primary key,
        c1 char(5) not null,
        c2 char(5) not null
     );

     insert into test_old(1, 'oldc1', 'oldc2');
     insert into test_old(2, 'oldc1', 'oldc2');
     insert into test_old(3, 'oldc1', 'oldc2');
     insert into test_old(4, 'oldc1', 'oldc2');

     insert into test_new(2, 'oldc1', 'oldc2');
     insert into test_new(3, 'oldc1', 'newc2');
     insert into test_new(4, 'newc1', 'oldc2');
     insert into test_new(5, 'newc1', 'newc2');

     select
     /*** Status ***/
        case
          when o.pk is null then 'inserted'
          when n.pk is null then 'deleted'
          else 'updated'
        end,

     /*** "new" values ***/
        coalesce(n.pk, o.pk),
        coalesce(n.c1, o.c1),
        coalesce(n.c2, o.c2),

     /*** "new values", another option: show only modified columns ***/
        case
          when o.pk is null then n.c1
          when n.pk is null then o.c1
          when o.c1 <> n.c1 then n.c1
          else NULL
        end,

        case
          when o.pk is null then n.c2
          when n.pk is null then o.c2
          when o.c2 <> n.c2 then n.c2
          else NULL
        end
     from test_old as o full outer join test_new as n
     on o.pk = n.pk
     where
        o.pk is null
     or
        n.pk is null
     or
        o.c1 <> n.c1
     or
        o.c2 <> n.c2
     order by coalesce(o.pk,n.pk)
     ;

In my experience, I would do a two-step compare. I'd Left Join TableA to TableB, joining on all columns one-at-a-time, with a WHERE stating that the ID of TableB IS NULL. Then the opposite, Right Join TableA to TableB where the ID of TableA IS NULL. That way, I can provide two reports (because chances are they will have to be treated separately to work on the data anyway).

NOT EXISTS is prefereable if there's any NULLable column...

Is there a more efficient way to do this? I've already made it a little more efficient by querying dbc.tables/columns joined to itself to generate my "join conditions" dynamically for me to not have to type. Is there something else that can be done?

- there are NULLs

- there may be duplicate rows

- you don't need to know if a row was updated/inserted

then use EXCEPT:

     select * from test_old
     except all
     select * from test_new;

     select * from test_new
     except all
     select * from test_old;

Dieter

If you don't want to take care of NULLs then use EXCEPT


	<Prev	Next>		<<First	<Prev	Next>	Last>>

Attachments

Library

Quick Reference