Archives of the TeradataForum

Message Posted: Wed, 15 Nov 2006 @ 18:28:43 GMT


	<Prev	Next>		<<First	<Prev	Next>	Last>>

Subj:		Re: UTF-8, Fastload and linux

From:		Victor Sokovin

0003 set session charset "UTF8";

Dave, I have no time to test anything but I'll just refer you to the other thread where we discussed the differences between using utilities in batch mode and interactively:

www.teradataforum.com/teradata/20060623_145628.htm

I thought about that thread when I saw you using the SET SESSION.

If you consult the FastLoad manual (I have the Sep 2006 edition open) then, strangely enough, it only lists three options: ASCII, KANJIEUC_0U and KANJISJIS_0S.

Of course, it could be just a problem with the manual but if you read the above literally you cannot expect UTF-8 support on this level, can you?

I would try to run the script in batch mode with "-c UTF8" option to check whether it makes any difference.

If you check the manual further you find the following restrictions on data types:

CHAR, CHARS(n), and

CHARACTERS(n)

n bytes n ASCII characters

and

VARCHAR(n) m + 2 bytes where m = 32000 16-bit integer, count m, followed by m ASCII characters

Did you notice that ASCII? Again, it could be a problem with the manual. Somebody forgot to update it? If not, then (VAR)CHAR(n) are both excluded from the UTF-8 context leaving only VARBYTE as "graphical" data types but those are quite difficult to use: your extraction process must provide all the technical bits and pieces in the file such as lengths, indicators, BOMs, etc.

So, a lot to be clarified and tested here.

I personally think that MultiLoad should be a better option with Unicode data.

Regards,

Victor


	<Prev	Next>		<<First	<Prev	Next>	Last>>

Attachments

Library

Quick Reference