Re: Number of ACTIVE users [From John Hall: Sat, 29 Sep 2001 @ 15:32 GMT]

Message Posted: Sat, 29 Sep 2001 @ 15:32:20 GMT

Subj:

Re: Number of ACTIVE users

From:

John Hall

I was thinking some more about this note. The question is: What's this approach actually doing for you? My point is that if this has been running for the last couple of days, what information does it give you that's actionable today?

I guess you could use the results of your routine and try to match it up with ResUsage, but I don't really know what to do with that information. Let's say that you match it up with ResUsage and find that there's some bottlenecking, all you can do is identify the users who were using the system at that time - not what they were doing (somebody with high CPU or I/O is not necessarily a problem). Depending on how you do things, knowing who your users are may be nearly worthless.

For example, I have been at several sites where all production work is done using the same UserID. If all your production jobs had the same Account String (which seems to be the norm), then all production work goes into one bucket of CPU and I/O. You can't differentiate the impact of one job from another.

If you think that it's a production job that is giving you grief, do you really need to manually go through your production jobs to identify which ones were actually running? Being snide: If you had a production job that was causing you grief, you probably already knew it from the phone call and don't need all this work just to prove it.

When I've seen this approach used in the past, the gathering process itself seems to become one of the big hitters when it comes to system utilization (remember how often you're running it). If it becomes one of the big hitters (eg- noticeable to management), then how are you going to justify the value it brings versus the resource consumption?

In one case where I was involved, the job was poorly designed and caused a hot-AMP situation every 10 minutes for 6 months without the system administrator noticing its impact - they were talking about fixing their slow system with an upgrade. Kind of reminds me of that old joke where I tell my doctor that it hurts every time I hit myself with a hammer. He tells me to stop hitting myself with the hammer...

My stance is that if you can't turn it into actionable items on a daily basis, then why collect this information? Admittedly, I can make some really pretty charts that will look great on my wall, but what is it doing for my system administration, workload tuning and trying to delay that next system upgrade? As system administrator, I want to identify work that is impacting my system and see that it gets fixed. After all, I want to make sure that the only reason for that next upgrade is future needs.

So before you go ahead with this, instead of wondering whether you might be missing something, you might ask what is it going to do for you.

Ok, that aside. You might think about making your PI's for NOW_ and PAST_AMPUSE non-unique and change your tables to MULTISET, you'll get better performance and reduce system impact. After all, DBC.ACCTG has a UPI already and you're emptying NOW_ and PAST_AMPUSE before you do your INS/SEL.

I also noticed that your table AMPUSE has a column called TAG (without a default value). When you do your INS/SEL into AMPUSE, the column list of the SEL doesn't match-up with AMPUSE - there's no risk of duplicate PI's or duplicate rows.

Finally, you don't carry the processor (VProc) number from DBC.ACCTG. What you're losing is the skew of the workload (eg- busiest VProc vs. most idle VProc). In general, my own experience is that skew and hash synonyms have the greatest impact on the efficiency of my system while being the thing that I probably have the best ability to fix. You really don't want to lose the skew information.

Attachments

Library

Quick Reference

Archives of the TeradataForum

Message Posted: Sat, 29 Sep 2001 @ 15:32:20 GMT