|
|
Archives of the TeradataForum
Message Posted: Thu, 16 Nov 2006 @ 14:45:51 GMT
Subj: | | Re: UTF-8, Fastload and linux |
|
From: | | Geoffrey Rommel |
| It will load the records which do not contain UTF-8 characters but, as soon as it encounters a field with UTF-8 characters, the record
length as seen by Teradata does not match our Define statement. | |
UTF-8 is a variable-length encoding. Characters between U+0000 and U+007F occupy one byte, those between U+0080 and U+07FF occupy two bytes,
etc. For details, see the Unicode Standard, sec. 3.9.
www.unicode.org/versions/Unicode4.0.0/ch03.pdf
You can still use fixed-length character specifications, but each UTF-8 column has to be padded to the maximum length for that field. For
instance, if your source field is char(20), it would probably occupy no more than 60 bytes in UTF-8 form, so just pad that field to 60 bytes in
every record and specify it as char(60). The pad bytes won't be loaded, so they can be blanks or any other value that is convenient.
You could also use VARTEXT, varchar() fields, and a delimiter, but that would slow down the load.
| |