Home Page for the TeradataForum
 

Archives of the TeradataForum

Message Posted: Thu, 28 Jul 2005 @ 17:58:45 GMT


     
  <Prev Next>   <<First <Prev
Next>
Last>>
 


Subj:   Re: Backup and Recovery of Large Multi-Terabyte tables?
 
From:   Dodgson, Mark

Geoffrey,

Sorry that I haven't explained it very well. I've just re-read my posting, and it is a bit vague. I'll have another try.....

This was a while ago now, but essentially, we encountered a problem when we ran BTEQ operations, against the rollforward target table, instead of always applying data changes via the rollforward mechanism. (I've forgotten the actual scenario that forced us to do this, but we needed to delete a number of rows for a specific transaction date (on the rollforward DR table), and then reapply these using fresh data taken from the production system - it was a one-off fix).

The BTEQ deletes and inserts against the target table worked okay - no immediate problems identified, and on the face of it, all is well. However, we'd unwittingly changed the internal row IDs of at least some of the rows involved in the BTEQ operations. The problem only manifests itself though when you next rollforward a permanent journal - note, job will not fail, but data is corrupted.

The table affected had a highly non-unique primary index, i.e. relatively large number of rows with the same rowhash (but different uniqueness values (UV)).

Say we have an PI value with 10 rows - all have the same rowhash, but possess distinct UV numbers from 1 to 10. When we ran the BTEQ delete, we would have deleted one of these rows based upon the transaction date we were trying to rework - say, for example, UV = 7 - which leaves us with 9 rows with the same rowhash, and assigned UV numbers 1,2,3,4,5,6,8,9,10. When the correct version of this row was re-inserted to the target rollforward table, this was done, but the *missing* UV number of 7 is not repopulated.

Instead, Teradata assigns the current highest UV value for the rowhash, plus 1 (i.e. 11).

So, post-BTEQ update, UV values on the DR version of the table for this particular rowhash are 1,2,3,4,5,6,8,9,10&11 whereas the same block of rows on the primary production system ran with UV values 1 to 10 inclusive. You can see where this is going now, can't you.......

Back on the production system, new transactions arrive for the same PI next day, and therefore a new UV value of 11 is assigned to the rowhash, to differentiate the new row from the existing 10 that are already there. When this is checkpointed and captured in the permanent journal, the UV value of 11 is captured also and carried forward to the DR system via archive and subsequent journal rollforward. However, when Arcmain attempts to roll this row into the DR version of the table, it encounters a row with the same rowhash and UV=11, so it *replaces it* with the version from the journal archive. The result - you've now *lost* data from your DR system (production version has 11 rows for the PI value, DR has 10). The row that was carefully slotted in via the BTEQ process has now been overwritten.

Ultimately, using PJ's, your target table that is subject to rollforward is essentially read only. You can only update it accurately (and consistently) via journal rollforward.

[ Many thanks to Jon Christie from NCR who helped us unearth this little gem a couple of years ago (cheers Jon) - in hindsight, it's fairly obvious what's going on, but at the time, it had us scratching our heads for quite a while...... my hair still hasn't grown back !! ]

Hope this helps,


Cheers,

Mark



     
  <Prev Next>   <<First <Prev
Next>
Last>>
 
 
 
 
 
 
 
 
 
  
  Top Home Privacy Feedback  
 
 
Copyright for the TeradataForum (TDATA-L), Manta BlueSky    
Copyright 2016 - All Rights Reserved    
Last Modified: 15 Jun 2023