Home Page for the TeradataForum
 

Archives of the TeradataForum

Message Posted: Mon, 14 Mar 2005 @ 17:58:25 GMT


     
  <Prev Next>   <<First <Prev Next> Last>>  


Subj:   Re: fast-export vertical & horizontal data distribution
 
From:   David Wellman

Hi Mirjam,

I'm not aware of any white paper /document about this, but here's my understanding.

A Fastexport is essentially a SELECT statement with a fast data transfer step at the end.

Think of normal (?!?!?!) SELECT processing...

- Teradata goes through whatever steps are necessary to build the final answer set which is generally distributed across all the AMPs.

- the rows on each AMP are typically not in any data sorted order, it's often rowhash sequence.

- if the SELECT contains an ORDER BY clause then each AMP does an AMP-local sort on it's own portion of the result set to honour that ORDER BY.

- (logically) each AMP returns it's first row to the bynet which then does a 'bynet merge' to ensure that the true first row is returned to the client application. This is repeated until all of data is returned. This process is a slow data transfer because data is (essentially) being returned one row at a time.


Where Fastexport gets it's speed is in the data transfer stage and it gets this speed by transferring blocks of data at a time - one block from each AMP. But in order to be able to transfer a block of data at a time and spread the load across all of the AMPs the final answer set needs to be evenly distributed. This is the first distribution (I think 'horizontal') and it always happens when you use Fastexport.

If you add an ORDER BY clause to the Fastexport SELECT you get the second distribution. In this case, the final answer set needs to be broken into blocks such that one block contains the first 'n' rows of the answer set, another block contains the next 'n' rows and so on. Now the first distribution results in the answer set rows being distributed in data value groups (to make up the blocks for the data transfer stage) and the second one ensures that those blocks are evenly distributed. this means that the final data transfer stage is quick - because it's transferring blocks of data at a time - and the anser set gets back to the client in the correct ('ORDER BY') order.

As I said, that's my understanding (which may not be perfect !) of this process.


Cheers,

Dave



     
  <Prev Next>   <<First <Prev Next> Last>>  
 
 
 
 
 
 
 
 
  
  Top Home Privacy Feedback  
 
 
Copyright for the TeradataForum (TDATA-L), Manta BlueSky    
Copyright 2016 - All Rights Reserved    
Last Modified: 15 Jun 2023