Archives of the TeradataForum

Message Posted: Fri, 20 Jul 2007 @ 11:49:12 GMT


	<Prev	Next>		<<First	<Prev	Next>	Last>>

Subj:		Eternally Foolproof Casting from text to number

From:		Anomy Anom

<-- Anonymously Posted: Thursday, July 19, 2007 22:32 -->

I know this topic has been raised quite a few times but it was only recently that I've had it occur as a maintenance issue when I was asked to diagnose a failure in an overnight batch process.

A production script failed because a tiny number of rows of alphabetic characters had been added to a 100 million row table. Suddenly a whole script failed because Teradata and/or SQL offers no safe way to convert text to numbers.

Needing to react quickly and knowing I was dealing with a CHAR(2) column I wrote the following to replace the original simple CAST. I wanted something that could NOT produce a script halting error, i.e. no matter what other minor muck gets put into the column in future.

     CAST(
      ( CASE
         WHEN
          SUBSTRING( DBN.TBN.CBN FROM 1 FOR 1)
           IN (' ', '1', '2', '3', '4', '5', '6', '7', '8', '9', '0' )
          AND
          SUBSTRING( DBN.TBN.CBN FROM 2 FOR 1)
           IN (' ', '1', '2', '3', '4', '5', '6', '7', '8', '9', '0' )
         THEN
          SUBSTRING( DBN.TBN.CBN FROM 1 FOR 2)
         END
      ) AS BYTEINT )

I used the IN list rather than an ANSI/ASCII range just because I had it in my head to not presume anything about character encodings.

My question then is: what would be a better way to write this?

My idea of "better" here is:

- more generically SQL rather than Teradata specific

- impervious to changes in the CHARACTER type of the source column

- applicable to more than just two character length strings

(It was assuming a length of two let me avoid testing for spaces *inside* as there is no inside of a length 2 string.)

I'm happy to presume only integer numbers. I'm not sure if I'd want to be proof against decimal points being in the string - obviously covering that would be better than not.

UDFs are not an option! I need to supply SQL for use buried deep in INSERT statements inside long batch scheduled BTEQ scripts.

btw, some may wonder that I didn't put efficiency in the definitions of "better". The answer is for a scheduled blind process I really don't care. Unless it will blow out INSERT time by a factor of 10 then I say let the machine crunch. I rather not ever ever have to debug this fault ever again.


	<Prev	Next>		<<First	<Prev	Next>	Last>>

Attachments

Library

Quick Reference