Archives of the TeradataForum

Message Posted: Wed, 02 Oct 2002 @ 19:10:16 GMT


	<Prev	Next>		<<First	<Prev	Next>	Last>>

Subj:		Veritas backup for Teradata: The elusive Access Module Error #34

From:		Frank C. Martinez IV

Hola, amigos de Teradata,

Well, I just passed a note to my local NCR Veritas support lady and the Goodyear guy who knows Veritas very well about the following problem, and I thought it might be profitable if there is anybody else using Veritas for backup out there. Perhaps you might know what's going on. The message is a little long, but I was trying to be thorough.

---------------------------------

Frank Martinez
10/02/2002 02:54 PM

To: DL129004, Chris Mahovich
cc: Dave Krauter
Subject: The elusive Access Module Error #34

Hi Dee & Chris,

This morning was a good morning. I successfully restored the Production tables to another database (copy) for training coming up, then "restored" some tables from the DEV box into that same database, because they want to train on something that isn't in production yet. This all worked very well, and I was very happy.

So then I attempted to restore a very old copy (8/27) of some tables that we need simply to maintain some history. I knew that these were off-site, and the last time I'd tried this, I ended up with the notorious Access Module Error #34. So I started the restore, called for the off-site tape, and sat down to watch. Guess what? It failed. GOOD NEWS! Since I have more a familiarity with Veritas now, and how it works (that sure helps, huh?) I noticed something significant. So here's the data on what I think is happening:

In the detailed log (in the directory /usr/openv/netbackup/logs/user_ops/Teradata/logs) on the PROD2 node, from which we run backups and restores, the following appears in the file akr-95108818-akr-95108818-20021002125406-0221:

12:53:17 (464952.001) INF - Use shared memory = 0
12:53:17 (464952.001) INF - Media id 001658 is not in a robotic library;
administrative interaction may be required to satisfy a mount request.
12:53:17 (464952.001) INF - Restore id = 464952.001
12:53:17 (464952.001) INF - Backup time = 1030426002
12:53:17 (464952.001) INF - Encrypt = 0
12:53:17 (464952.001) INF - Client read timeout = 900
12:53:17 (464952.001) INF - Media mount timeout = 0
12:53:17 (464952.001) INF - client = ahqtpc2
12:53:17 (464952.001) INF - requesting_client = ahqtpc2
12:53:17 (464952.001) INF - browse_client = ahqtpc2
13:08:28 <16> [0,ahqtpc2] NetBackup has been notified by Arcmain of an
abnormal shutdown condition.

I'd like to point out a few things. The second line, with the Media id on it, confirms that this particular tape isn't available. So it's going to take a while. Just a little while later, Veritas prints out the value of 900 for Client read timeout. Assuming this is seconds, this works out to be 15 minutes. Immediately after that line, Veritas shows that the Media mount timeout is 0. I assume that means that timeout is not enabled for mounting of a tape. So at 12:53:17, all this happens. Almost exactly 15 minutes later (13:08:28, or 911 seconds later), NetBackup gets notified by Arcmain of an abnormal shutdown condition. When we look at the Arc log (directory of /usr/openv/netbackup/ext/db_ext/NCR_Teradata/arclogs), we see the following (file name is akr-95108818.20021002125406.0221.0000.log) output:

10/02/2002 12:51:12                  (GDYR_WRK.STG_SCHED_LINE_0823),
10/02/2002 12:51:12                  (GDYR_WRK.STG_SCHED_LINE_EU072402),
10/02/2002 12:51:12                  (GDYR_WRK.STG_SCHED_LINE_EU072502),
10/02/2002 12:51:12  RELEASE LOCK,
10/02/2002 12:51:12  FILE = X;
10/02/2002 12:51:14  LOGGED ON   16 SESSIONS
10/02/2002 13:08:27  *** Failure ARC0805:Access Module returned error code 34: .
10/02/2002 13:08:28  LOGGED OFF  18 SESSIONS
10/02/2002 13:08:28  ARCMAIN TERMINATED WITH SEVERITY 12

Notice that the failure happened just right before the last log records in the other log were written. It's also interesting to me that Arc never got to assign an event number, as it did in this other restore that worked:

10/02/2002 05:38:04                  (TRN_EDW)    (FROM (GDYR_EDW)),
10/02/2002 05:38:04                  (TRN_VWS)    (FROM (GDYR_VWS)),
10/02/2002 05:38:04  RELEASE LOCK,
10/02/2002 05:38:04  FILE = X;
10/02/2002 05:38:06  LOGGED ON   16 SESSIONS
10/02/2002 05:40:49  UTILITY EVENT NUMBER  - 98
10/02/2002 05:40:49  "TRN_BI_VWS"."AKRON_DEPT_NAME" CREATED
10/02/2002 05:40:49  "TRN_BI_VWS"."CO" CREATED

This is very interesting to me, because we sometimes have internal delayswhen we do backups, and they occur AFTER the utility event number is assigned. We've assumed that these delays are from Veritas waiting for a mount, which would make sense, SINCE THEY DON'T TIME OUT (remember Media mount timeout is set to 0). Meanwhile, on my NT box on my desk, which acts as the controlling backup server to run backups and restores, the log akr-95108818-20021002125406-0221.log contains the following, with no indication that anything has happened to Veritas:

12:53:17 (464952.001) INF - Use shared memory = 0
12:53:17 (464952.001) INF - Media id 001658 is not in a robotic library;
administrative interaction may be required to satisfy a mount request.
12:53:17 (464952.001) INF - Restore id = 464952.001
12:53:17 (464952.001) INF - Backup time = 1030426002
12:53:17 (464952.001) INF - Encrypt = 0
12:53:17 (464952.001) INF - Client read timeout = 900
12:53:17 (464952.001) INF - Media mount timeout = 0
12:53:17 (464952.001) INF - client = ahqtpc2
12:53:17 (464952.001) INF - requesting_client = ahqtpc2
12:53:17 (464952.001) INF - browse_client = ahqtpc2

What is really interesting is on the local copy of Veritas on my NT box, the Veritas server that's controlling this restore, that job is showing up with a status of In Progress:

Which confirms Chris (Mahovich)'s comment from the last time I tried this, that the job still appeared to be running. So it's still "running" on my server and on the Unix box as well. The Veritas part of this will sit around waiting for the tape to be mounted, and so will the administrator program.

So my assumption about the Media read timeout is probably right. It's causing any job that has to wait for 15 minutes or more before it really starts to timeout. So how do you change that one little line (Client read timeout)? I am hopeful that there's just a command or some other way to change that option. Any ideas, amigos?

PS: I'm not touching anything, and leaving it in it's "suspended" state.


	<Prev	Next>		<<First	<Prev	Next>	Last>>

Attachments

Library

Quick Reference