Archives of the TeradataForum
Message Posted: Thu, 04 Mar 2004 @ 13:22:20 GMT
| Subj: || || Re: Storing Multiple Languages in Teradata |
| From: || || Victor Sokovin |
| ||Does Teradata support this "UTF16" format on a table by table or even column by column basis, or it this a "server level" setting. I think
that is the biggest area of concern. The concern since Greek is a mix of 1 and 2 byte characters is - will Teradata always store Unicode defined
data in two bytes per character, or does it use "smart storage" based on the characters actually being stored?|| |
Teradata uses internally (on the server side) the 16-bit per character fixed format, no smart storage. I guess this is done this way because of
the reduced CPU usage for Unicode string manipulation. Variable length internal storage might be tricky to implement. TD does support the popular
UTF-8 on the client side.
On the server side, Unicode can be specified on the column by column basis. You can have both Latin and Unicode columns in the same table.
| ||I definitely agree with the "monitor the whole data chain" comment. We're taking this step by step - and the last step is
I'd like to add one more comment on Greek and Russian. Both languages have at least five different code pages in active use (and many more
historical ones!). It is convenient to agree with all data providers on one common code page for all communications. UTF-8 seems a good candidate.
Having agreed on one common format will help reduce the number of code page conversions on the DWH side. There will be enough of them during ETL
and reporting phases.