Home Page for the TeradataForum
 

Archives of the TeradataForum

Message Posted: Thu, 16 Nov 2006 @ 14:45:51 GMT


     
  <Prev Next>   <<First <Prev Next> Last>>  


Subj:   Re: UTF-8, Fastload and linux
 
From:   Geoffrey Rommel

  It will load the records which do not contain UTF-8 characters but, as soon as it encounters a field with UTF-8 characters, the record length as seen by Teradata does not match our Define statement.  


UTF-8 is a variable-length encoding. Characters between U+0000 and U+007F occupy one byte, those between U+0080 and U+07FF occupy two bytes, etc. For details, see the Unicode Standard, sec. 3.9.

www.unicode.org/versions/Unicode4.0.0/ch03.pdf


You can still use fixed-length character specifications, but each UTF-8 column has to be padded to the maximum length for that field. For instance, if your source field is char(20), it would probably occupy no more than 60 bytes in UTF-8 form, so just pad that field to 60 bytes in every record and specify it as char(60). The pad bytes won't be loaded, so they can be blanks or any other value that is convenient.

You could also use VARTEXT, varchar() fields, and a delimiter, but that would slow down the load.



     
  <Prev Next>   <<First <Prev Next> Last>>  
 
 
 
 
 
 
 
 
  
  Top Home Privacy Feedback  
 
 
Copyright for the TeradataForum (TDATA-L), Manta BlueSky    
Copyright 2016 - All Rights Reserved    
Last Modified: 15 Jun 2023