Home Page for the TeradataForum
 

Archives of the TeradataForum

Message Posted: Wed, 15 Nov 2006 @ 18:28:43 GMT


     
  <Prev Next>   <<First <Prev Next> Last>>  


Subj:   Re: UTF-8, Fastload and linux
 
From:   Victor Sokovin

  0003 set session charset "UTF8";  


Dave, I have no time to test anything but I'll just refer you to the other thread where we discussed the differences between using utilities in batch mode and interactively:

www.teradataforum.com/teradata/20060623_145628.htm


I thought about that thread when I saw you using the SET SESSION.

If you consult the FastLoad manual (I have the Sep 2006 edition open) then, strangely enough, it only lists three options: ASCII, KANJIEUC_0U and KANJISJIS_0S.

Of course, it could be just a problem with the manual but if you read the above literally you cannot expect UTF-8 support on this level, can you?

I would try to run the script in batch mode with "-c UTF8" option to check whether it makes any difference.

If you check the manual further you find the following restrictions on data types:

CHAR, CHARS(n), and

CHARACTERS(n)

n bytes n ASCII characters

and

VARCHAR(n) m + 2 bytes where m = 32000 16-bit integer, count m, followed by m ASCII characters

Did you notice that ASCII? Again, it could be a problem with the manual. Somebody forgot to update it? If not, then (VAR)CHAR(n) are both excluded from the UTF-8 context leaving only VARBYTE as "graphical" data types but those are quite difficult to use: your extraction process must provide all the technical bits and pieces in the file such as lengths, indicators, BOMs, etc.

So, a lot to be clarified and tested here.

I personally think that MultiLoad should be a better option with Unicode data.


Regards,

Victor



     
  <Prev Next>   <<First <Prev Next> Last>>  
 
 
 
 
 
 
 
 
  
  Top Home Privacy Feedback  
 
 
Copyright for the TeradataForum (TDATA-L), Manta BlueSky    
Copyright 2016 - All Rights Reserved    
Last Modified: 15 Jun 2023