Home Page for the TeradataForum
 

Archives of the TeradataForum

Message Posted: Thu, 05 Feb 2004 @ 16:35:49 GMT


     
  <Prev Next>   <<First <Prev
Next>
Last>>
 


Subj:   Re: Difference between Hard and Soft RI
 
From:   Ballinger, Carrie

Just to add some clarity about what the actual difference is between hard and soft RI, I thought a couple of exmples might be useful.

Below are two explains from a V2R5 system with soft RI applied to the MyOrders table. Below the explains I have included the show table after I applied the soft RI. Only the MyOrders table had to be altered for soft RI to work. Notice in both explains that the customer table is never accessed.

explain
select m_custkey,m_orderdate,m_totalprice
from MyOrders, customer
where m_custkey = c_custkey
and c_custkey between 14 and 24;

 *** Help information returned. 16 rows.
 *** Total elapsed time was 1 second.
Explanation
--------------------------------------------------
 
  1)First, we lock a distinct CAB."pseudo table" for read on a RowHash to prevent global deadlock for CAB.MyOrders.  
  2)Next, we lock CAB.MyOrders for read.  
  3)We do an all-AMPs RETRIEVE step from CAB.MyOrders by way of an all-rows scan with a condition of ("(CAB.MyOrders.M_CUSTKEY >= 14) AND ((CAB.MyOrders.M_CUSTKEY <= 24) AND ((CAB.MyOrders.M_CUSTKEY <= 24) AND (CAB.MyOrders.M_CUSTKEY >= 14 )))") into Spool 1 (group_amps), which is built locally on the AMPs. The input table will not be cached in memory, but it is eligible for synchronized scanning. The size of Spool 1 is estimated with low confidence to be 90 rows. The estimated time for this step is 12.32 seconds.  
  4)Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request.  
  ->The contents of Spool 1 are sent back to the user as the result of statement 1. The total estimated time is 12.32 seconds.  


explain
select c_custkey, sum(m_totalprice)
from customer, MyOrders
where m_custkey = c_custkey
and c_custkey between 14 and 24
group by c_custkey;

 *** Help information returned. 21 rows.
 *** Total elapsed time was 1 second.
Explanation
--------------------------------------------------
 
  1)First, we lock a distinct CAB."pseudo table" for read on a RowHash to prevent global deadlock for CAB.MyOrders.  
  2)Next, we lock CAB.MyOrders for read.  
  3)We do a SUM step to aggregate from CAB.MyOrders by way of an all-rows scan with a condition of ("(CAB.MyOrders.M_CUSTKEY >= 14) AND ((CAB.MyOrders.M_CUSTKEY <= 24) AND ((CAB.MyOrders.M_CUSTKEY <= 24) AND (CAB.MyOrders.M_CUSTKEY >= 14 )))"), and the grouping identifier in field 1026. Aggregate Intermediate Results are computed globally, then placed in Spool 3. The input table will not be cached in memory, but it is eligible for synchronized scanning. The size of Spool 3 is estimated with low confidence to be 6 rows. The estimated time for this step is 12.52 seconds.  
  4)We do an all-AMPs RETRIEVE step from Spool 3 (Last Use) by way of an all-rows scan into Spool 1 (group_amps), which is built locally on the AMPs. The size of Spool 1 is estimated with low confidence to be 6 rows. The estimated time for this step is 0.83 seconds.  
  5)Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request.  


With soft RI we don't have to access the customer table because we have essentially told the optimizer when we established soft RI that for every m_custkey value in the MyOrders table there will be a c_custkey value in the customer table. So the optimizer knows that "c_custkey between 14 and 24" is equivalent to "m_custkey between 14 and 24". Thus the join is not needed because the selection criteria can be taken from the customer table and applied to the order table. Of course, if there were other selection criteria on customer, we would have to perform the join.

The reason there is no overhead with soft RI is that we never attempt to validate the relationships when data is updated. Soft RI means you trust the ETL end to guarentee that the relationships always have integrity. The optimizer simply accepts, trusts, that the integrity is there, but the database never enforces it.

This is only something useful to sites that have an extremely high level of data integrity and can guarantee that relationships between tables are always kept in sync.

Here is how the MyOrders looks with soft RI defined.

show table MyOrders;

 *** Text of DDL statement returned.
 *** Total elapsed time was 1 second.

---------------------------------------------------------------------------
CREATE MULTISET TABLE CAB.MyOrders ,NO FALLBACK ,
     NO BEFORE JOURNAL,
     NO AFTER JOURNAL
     (
      M_ORDERKEY INTEGER NOT NULL,
      M_CUSTKEY INTEGER NOT NULL,
      M_ORDERSTATUS CHAR(1) CHARACTER SET LATIN CASESPECIFIC NOT NULL,
      M_TOTALPRICE DECIMAL(15,2) NOT NULL,
      M_ORDERDATE DATE FORMAT 'yyyy-mm-dd' NOT NULL,
      M_ORDERPRIORITY CHAR(15) CHARACTER SET LATIN CASESPECIFIC NOT NULL,
      M_CLERK CHAR(15) CHARACTER SET LATIN CASESPECIFIC NOT NULL,
      M_SHIPPRIORITY INTEGER NOT NULL,
      M_COMMENT VARCHAR(79) CHARACTER SET LATIN CASESPECIFIC NOT NULL,
FOREIGN KEY ( M_CUSTKEY ) REFERENCES WITH NO CHECK OPTION CAB.CUSTOMER ( C_CUSTKEY ))
UNIQUE PRIMARY INDEX ( M_ORDERKEY );

Thanks, --Carrie



     
  <Prev Next>   <<First <Prev
Next>
Last>>
 
 
 
 
 
 
 
 
 
  
  Top Home Privacy Feedback  
 
 
Copyright for the TeradataForum (TDATA-L), Manta BlueSky    
Copyright 2016 - All Rights Reserved    
Last Modified: 15 Jun 2023