Archives of the TeradataForum

Message Posted: Thu, 04 Mar 2004 @ 13:22:20 GMT


	<Prev	Next>		<<First	<Prev	Next>	Last>>

Subj:		Re: Storing Multiple Languages in Teradata

From:		Victor Sokovin

Does Teradata support this "UTF16" format on a table by table or even column by column basis, or it this a "server level" setting. I think that is the biggest area of concern. The concern since Greek is a mix of 1 and 2 byte characters is - will Teradata always store Unicode defined data in two bytes per character, or does it use "smart storage" based on the characters actually being stored?

Teradata uses internally (on the server side) the 16-bit per character fixed format, no smart storage. I guess this is done this way because of the reduced CPU usage for Unicode string manipulation. Variable length internal storage might be tricky to implement. TD does support the popular UTF-8 on the client side.

On the server side, Unicode can be specified on the column by column basis. You can have both Latin and Unicode columns in the same table.

I definitely agree with the "monitor the whole data chain" comment. We're taking this step by step - and the last step is Teradata.

I'd like to add one more comment on Greek and Russian. Both languages have at least five different code pages in active use (and many more historical ones!). It is convenient to agree with all data providers on one common code page for all communications. UTF-8 seems a good candidate. Having agreed on one common format will help reduce the number of code page conversions on the DWH side. There will be enough of them during ETL and reporting phases.

Regards,

Victor


	<Prev	Next>		<<First	<Prev	Next>	Last>>

Attachments

Library

Quick Reference