Archives of the TeradataForum

Message Posted: Tue, 07 Oct 2008 @ 13:49:29 GMT


	<Prev	Next>		<<First	<Prev	Next>	Last>>

Subj:		Re: Control Char Issue

From:		Victor Sokovin

I want to clarify that if a character takes 1 char in sql server that should also only take 1 char in teradata?

If you literally mean "char" then the answer is yes. Teradata would not do things like converting � to ae or � to oe etc.

If, however, you intended to ask

"I want to clarify that if a character takes 1 BYTE in sql server that should also only take 1 BYTE in teradata?

then the answer depends on the character sets chosen in SQL Server and Teradata. If you use Unicode or Asian character sets in Teradata then the 1-byte char in SQL Server would cost you two bytes in Teradata. Internally, Unicode chars are always stored as 16 bits = 2 bytes.

For one of the column /column_name/, we have a control character issue. On the SQL side we have the value column_name='1234567890- =~!@$�()qwertyuiop��\QWERTYUIOP{} * asdf*. On the Teradata side we have the value column_name= '1234567890-=~!@$ÿ()qwertyuiopÿÿ\QWERTYUIOP{} *a*. This column is of type varchar(50). The control character �� in SQL side has length �1� while the corresponding control character �ÿ� in Teradata side has length �2�. Due to this column_name value exceeds column length and results in truncation of data.

You did not give us any hints as to what character set is used in the definition of this column. I'd guess the above seems to suggest that this is not caused by Unicode. Even if the character set is what is called "Latin" this character should not be a problem. Its scientific name is LATIN SMALL LETTER Y WITH DIAERESIS and it is part of the Latin repertoire.

Yet another guess is that the process transferring data from SQL Server to Teradata performs an implicit translation to ASCII and then � gets converted to whatever the process is capable of coming up with. Apparently, it comes up with ÿ, which increases the overall length of the string. So, it's not that Teradata requires two bytes to store that particular character but the character itself gets lost in translation. The latter is bad not only because the string length increases but it is bad because you don't even know which character it is.

So, the recommendations would be:

1. Check that the char set is indeed Latin.

2. Inspect and correct problems in the data transfer process: avoid all implicit translations leading to data corruption.

Victor


	<Prev	Next>		<<First	<Prev	Next>	Last>>

Attachments

Library

Quick Reference