Archives of the TeradataForum

Message Posted: Thu, 16 Nov 2006 @ 14:45:51 GMT


	<Prev	Next>		<<First	<Prev	Next>	Last>>

Subj:		Re: UTF-8, Fastload and linux

From:		Geoffrey Rommel

It will load the records which do not contain UTF-8 characters but, as soon as it encounters a field with UTF-8 characters, the record length as seen by Teradata does not match our Define statement.

UTF-8 is a variable-length encoding. Characters between U+0000 and U+007F occupy one byte, those between U+0080 and U+07FF occupy two bytes, etc. For details, see the Unicode Standard, sec. 3.9.

www.unicode.org/versions/Unicode4.0.0/ch03.pdf

You can still use fixed-length character specifications, but each UTF-8 column has to be padded to the maximum length for that field. For instance, if your source field is char(20), it would probably occupy no more than 60 bytes in UTF-8 form, so just pad that field to 60 bytes in every record and specify it as char(60). The pad bytes won't be loaded, so they can be blanks or any other value that is convenient.

You could also use VARTEXT, varchar() fields, and a delimiter, but that would slow down the load.


	<Prev	Next>		<<First	<Prev	Next>	Last>>

Archives

2016		2007
2015		2006
2014		2005
2013		2004
2012		2003
2011		2002
2010		2001
2009		2000
2008		1999

2006 Indexes

Jan		Jul
Feb		Aug
Mar		Sep
Apr		Oct
May		Nov
Jun		Dec

Last Modified: 15 Jun 2023