Re: ARCMAIN throughput - limiting factors [From Bob Hahn: Wed, 09 Nov 2005 @ 18:50 GMT]

Message Posted: Wed, 09 Nov 2005 @ 18:50:31 GMT

Subj:

Re: ARCMAIN throughput - limiting factors

From:

Bob Hahn

A couple of comments.

First the ESCON/FICON connections between the mainframe and Teradata are scaleable--as you add connections you add bandwidth, up to 1 per Teradata node. For ESCON we have measured closer to 40GB/hour than 30GB/hour per ESCON channel--assuming obviously no other effective bottlenecks. TDP load balances across all available connections at the request level (subsequent requests on sessions might be sent on different connections than the one used for logon or prior requests).

FICON connections have been measured at >4x ESCON.

The throughput is limited by the effective bottleneck which is typically either Teradata capacity/contention, channel bandwidth, mainframe cpu/contention, I/O capacity.

FICON won't necessarily increase throughput/bandwidth unless ESCON is the bottleneck and you configure more FICON capacity than the ESCON capacity it replaces. Most customers are NOT bottlenecked on the channels so a primary benefit of FICON is expected to be reducing the channels/adapters required to configure sufficient or equivalent bandwidth relative to ESCON--but you still need to consider redundancy and balance (e.g. everything going through 1 FICON risks skewing that node). In round numbers on the 5400 we've seen about 1% node cpu busy per each 3MB/second.

The mainframe cpu consumption of ARC is rather predictable and proportional to throughput. It used to be about 1.5 - 2.5 MI/MB. To put it another way, at 50MB/second this would be 75-125 million instructions/second.

Depending on your processor model from this you can predict cpu busy--e.g. on a mainframe with 250 MIPS engines this would be 30%-50% busy on 1 cpu--for overall cpu busy divide by the number of cpus (there are more than 1 cpu seconds per elapsed second on multi-cpu mainframes). Although ARC uses only 1 cpu for it's mainline processing, the data transfer from TDP to ARC is done asynchronously via an SRB so ARC can actually use >1 cpu second/elapsed second.

You can get your exact numbers from ARC job output--you can compute the bytes transferred and the cpu consumed and then predict e.g. total cpu consumption to dump x bytes and how long it will take at particular levels of cpu utilization (until and unless some bottleneck is hit of course).

This is another reason 2 jobs might do better than one--if one job is saturating a cpu (considering contention etc.) a second job would use another cpu. TDP itself is multithreaded and uses multiple cpus.

If mainframe cpu contention is limiting throughput, MVS tuning can be used to increase the priority of the ARC job letting it absorb more cpu/elapsed second until some other bottleneck is hit. It's just the standard tradeoff--the total cpu consumption is proportional to the total amount of data to be dumped--if the elapsed time is halved the cpu percent busy will double.

It sounds to me like what is happening here is that the effective bottleneck is how fast Teradata can provide data to a single ARC job. Additional simultaneous ARC jobs (cluster, different tables/database, whatever) deliver more data per second from Teradata to the mainframe than does 1 job.

Attachments

Library

Quick Reference

Archives of the TeradataForum

Message Posted: Wed, 09 Nov 2005 @ 18:50:31 GMT