Friday, August 20, 2010

Oracle Block Corruptions


 There are different layers where a corruption can occur.

1. OS layer
2. DB layer
3. SAP Pool / Cluster table layer
4. Application layer

The best way to begin the investigation is through File system check.
SAP pool or cluster table layer check can be done with the tool R3check.

DB layer corruptions
A corrupted block can occur in datafiles or in redologfiles.

Consistency check can be performed with the following tools.

1. dbverify - checks if the database block fulfills the predefined rules
2. export - reads the data stored in table blocks.
3. analyze - reads table and index data and performs cross checks.

Note: If using RMAN for backing up the database, an implicit consistency check for blocks saved is performed. Explicit dbverify check is not needed.

The above 3 methods can only be used for datafiles.
Corruptions in redolog files can only be found when applying the redologs during recovery. If you dump a redolog file and do not get an error, this is not sufficient to prove that the redo log is ok.

We basically dump the output of redo log files in a trace and then read the trace file to understand the content. Below are some of the useful command.

The following ways of dumping a redo log file are covered

1. To dump records based in DBA (Data Block Address)
2. To dump records based on RBA (Redo Block Address)
3. To dump records based on SCN
4. To dump records based on time
5. Dump the file header information
6. Dump an entire log file

1. To dump records based on DBA  (Data Block Address)

Connect to database using sysdba and execute the below command
ALTER SYSTEM DUMP LOGFILE ‘filename’  DBA MIN (fileno) (blockno) DBA MAX (fileno) (blockno);

Example:

ALTER SYSTEM DUMP LOGFILE ‘u01/oracle/V7323/dbs/arch1_76.dbf’ DBA MIN 5 . 31125 DBA MAX 5 . 31150;
This will cause all the changes to the specified range of data blocks to be dumped to the trace file.  In the example given, all redo records for file #5, blocks 31125 thru 31150 are dumped.

2. To dump records based on RBA (Redo Block Address)

This will dump all redo records for the range of redo addresses specified for the given sequence number and block number.

Syntax:
ALTER SYSTEM DUMP LOGFILE ‘filename’ RBA MIN seqno blockno RBA MAX seqno blockno;

Example:
ALTER SYSTEM DUMP LOGFILE ‘u01/oracle/V7323/dbs/arch1_76.dbf’ RBA MIN 2050 13255 RBA MAX 2255 15555;

3. To dump records based on SCN

Using this option will cause redo records owning changes within the SCN range
specified to be dumped to the trace file.

ALTER SYSTEM DUMP LOGFILE ‘filename’ SCN MIN minscn SCN MAX maxscn;

Example:
ALTER SYSTEM DUMP LOGFILE ‘u01/oracle/V7323/dbs/arch1_76.dbf’ SCN MIN 103243  SCN MAX 103294;

4. To dump records based on time

Using this option will cause redo records created within the time range specified to be dumped to the trace file.
ALTER SYSTEM DUMP LOGFILE ‘filename’ TIME MIN value TIME MAX value;

Example:
ALTER SYSTEM DUMP LOGFILE ‘u01/oracle/V7323/dbs/arch1_76.dbf’ TIME MIN 299425687 TIME MAX 299458800;

Please Note: the time value is given in REDO DUMP TIME

5. Dump the file header information

This will dump file header information for every online redo log file.

alter session set events immediate trace name redohdr level 10;

6. Dump an entire log file:

ALTER SYSTEM DUMP LOGFILE ‘filename’;

Please note: Fully qualify the filename, and include the single quotes.

Example:
ALTER SYSTEM DUMP LOGFILE ‘u01/oracle/V7323/dbs/arch1_76.dbf’;

Note: Please know that brconnect -f check is not a consistency check. This is only for left freespace, parameter settings, last successful backup etc.

A proper consistency check can prevent data loss because of the corruption. If you check sufficiently often for corruptions you can restore the file containing the corrupted blocks from a backup that does not contain the corruption and recover up to the current point in time. It is quite unlikely that the same corruption is in the archive logs too.

Recommendation: Check the database for consistency at least once a week.

Analyze and export reads the blocks to be checked into the SGA which is why the buffer quality of the DB block buffer is adversely affected for a short while.

Analyze – checks both tables and indexes and may run for long time. Better to run this when the system has low workload. Buffer quality is adversely affected.
Export – Only checks tables. Performance loss as a result of export processes. Certain kinds of corruptions are exported without an error.
If you later try to reorganize this type of table (export/import), you will have problems during import. Buffer quality is adversely affected.

Dbverify – feasible in running operation, this checks for table and indexes and also the blank DB blocks. Doesn’t checks the cross references. Reads the blocks without loading them in the SGA. So, buffer quality is not affected. Dbverify can be run on the data files restored from the backup (without these files having to belong to a DB). Also checks the data in the LOB columns.

To be continued...

Tuesday, August 17, 2010

Oracle Block Corruptions


 There are different layers where a corruption can occur.

1. OS layer
2. DB layer
3. SAP Pool / Cluster table layer
4. Application layer

The best way to begin the investigation is through File system check.
SAP pool or cluster table layer check can be done with the tool R3check.

DB layer corruptions
A corrupted block can occur in datafiles or in redologfiles.

Consistency check can be performed with the following tools.

1. dbverify - checks if the database block fulfills the predefined rules
2. export - reads the data stored in table blocks.
3. analyze - reads table and index data and performs cross checks.

Note: If using RMAN for backing up the database, an implicit consistency check for blocks saved is performed. Explicit dbverify check is not needed.

The above 3 methods can only be used for datafiles.
Corruptions in redolog files can only be found when applying the redologs during recovery. If you dump a redolog file and do not get an error, this is not sufficient to prove that the redo log is ok.

We basically dump the output of redo log files in a trace and then read the trace file to understand the content. Below are some of the useful command.

The following ways of dumping a redo log file are covered

1. To dump records based in DBA (Data Block Address)
2. To dump records based on RBA (Redo Block Address)
3. To dump records based on SCN
4. To dump records based on time
5. Dump the file header information
6. Dump an entire log file

1. To dump records based on DBA  (Data Block Address)

Connect to database using sysdba and execute the below command
ALTER SYSTEM DUMP LOGFILE ‘filename’  DBA MIN (fileno) (blockno) DBA MAX (fileno) (blockno);

Example:

ALTER SYSTEM DUMP LOGFILE ‘u01/oracle/V7323/dbs/arch1_76.dbf’ DBA MIN 5 . 31125 DBA MAX 5 . 31150;
This will cause all the changes to the specified range of data blocks to be dumped to the trace file.  In the example given, all redo records for file #5, blocks 31125 thru 31150 are dumped.

2. To dump records based on RBA (Redo Block Address)

This will dump all redo records for the range of redo addresses specified for the given sequence number and block number.

Syntax:
ALTER SYSTEM DUMP LOGFILE ‘filename’ RBA MIN seqno blockno RBA MAX seqno blockno;

Example:
ALTER SYSTEM DUMP LOGFILE ‘u01/oracle/V7323/dbs/arch1_76.dbf’ RBA MIN 2050 13255 RBA MAX 2255 15555;

3. To dump records based on SCN

Using this option will cause redo records owning changes within the SCN range
specified to be dumped to the trace file.

ALTER SYSTEM DUMP LOGFILE ‘filename’ SCN MIN minscn SCN MAX maxscn;

Example:
ALTER SYSTEM DUMP LOGFILE ‘u01/oracle/V7323/dbs/arch1_76.dbf’ SCN MIN 103243  SCN MAX 103294;

4. To dump records based on time

Using this option will cause redo records created within the time range specified to be dumped to the trace file.
ALTER SYSTEM DUMP LOGFILE ‘filename’ TIME MIN value TIME MAX value;

Example:
ALTER SYSTEM DUMP LOGFILE ‘u01/oracle/V7323/dbs/arch1_76.dbf’ TIME MIN 299425687 TIME MAX 299458800;

Please Note: the time value is given in REDO DUMP TIME

5. Dump the file header information

This will dump file header information for every online redo log file.

alter session set events immediate trace name redohdr level 10;

6. Dump an entire log file:

ALTER SYSTEM DUMP LOGFILE ‘filename’;

Please note: Fully qualify the filename, and include the single quotes.

Example:
ALTER SYSTEM DUMP LOGFILE ‘u01/oracle/V7323/dbs/arch1_76.dbf’;

Note: Please know that brconnect -f check is not a consistency check. This is only for left freespace, parameter settings, last successful backup etc.

A proper consistency check can prevent data loss because of the corruption. If you check sufficiently often for corruptions you can restore the file containing the corrupted blocks from a backup that does not contain the corruption and recover up to the current point in time. It is quite unlikely that the same corruption is in the archive logs too.

Recommendation: Check the database for consistency at least once a week.

Analyze and export reads the blocks to be checked into the SGA which is why the buffer quality of the DB block buffer is adversely affected for a short while.

Analyze – checks both tables and indexes and may run for long time. Better to run this when the system has low workload. Buffer quality is adversely affected.
Export – Only checks tables. Performance loss as a result of export processes. Certain kinds of corruptions are exported without an error.
If you later try to reorganize this type of table (export/import), you will have problems during import. Buffer quality is adversely affected.

Dbverify – feasible in running operation, this checks for table and indexes and also the blank DB blocks. Doesn’t checks the cross references. Reads the blocks without loading them in the SGA. So, buffer quality is not affected. Dbverify can be run on the data files restored from the backup (without these files having to belong to a DB). Also checks the data in the LOB columns.

To be continued...

Cluster table diagnosis using R3check utility

Use this utility to check whether a cluster table contains incorrect cluster records.
 If a termination occurs when you process a cluster table in productive operation, you can find information about the cluster table in which the problems occur in the corresponding system log and short dumps.
Execute the program R3 check for the diagnosis, which generates an output line with the key of the affected record for every defective cluster record. Create a control file for this on operating system level with the following contents.
export file='/deve/null' dumping=yes client=CLIENTID
select * from log.clustertable

and start the check as adm user
R3check controlfile > outputfile

The lines in the output have the following structure:
phys. cluster table:  key1*key2*key3... (error number)
To check a physical cluster table, call R3check for a logical cluster table which belongs to the corresponding physical cluster table.

If after consulting the SAP Hotline on how to proceed in dealing with the error, you need to export the data belonging to a certain cluster key, generate an additional control file using the following contents:
export file='datafile' client=nnn
select * from phys. cluster table where KEY1 = key1 and KEY2 = key2
and ...
and start the export as adm user
R3trans  controlfile
data file for the name of the data file, into which the exported data is to be written.

Monday, August 16, 2010

JMS Message Monitoring in CE 7.1 system

In this article we reviewed the available Telnet commands related to the SAP JMS Provider.
how you can use these commands in combination to investigate a particular problem or just to
monitor the runtime status of the JMS Provider.

Telnet Commands Related to the SAP JMS Provider
The SAP JMS Provider service provides Telnet commands for administering and monitoring the JMS
resources in a server. To use these commands, you have to connect to the AS Java using Telnet and enable
the JMS commands group with add jms. To show the help information about the available commands
under the jms group, you can type one of the following at the Telnet command prompt:
􀁸 jms –h
􀁸 jms -?
􀁸 man jms

There are several subgroups of JMS Telnet commands:
􀁸 jms_list – this command provides details about the JMS runtime environment
􀁸 list_temp_destinations – this one displays all currently active temporary destinations.

Example Scenario
All JMS-related Telnet commands give you the possibility to monitor the runtime state of the SAP JMS
Provider. To illustrate the usage of the described Telnet commands, let us consider one simple scenario.
Imagine we just discovered that some persistent messages are not delivered to our application and we want
to investigate what might have happened with them. For the purposes of this example, we will use the
sapDemoQueue destination and the default JMS Virtual Provider.

The following procedure describes one possible path of investigation and the respective sequence of
commands.

telnet to the java server
telnet localhost 50008
Administrator / password
command >> jms add

jms_list msg <> default
This command lists all messages sent to the 
<> destination that are present in the databae. If there are no messages in this list, we know that there are currently no messages pending for delivery either no messages have been produced, or all that have been produced have already been consumed and acknowledged. We can try to determine which producer was supposed to send them.

jms_list producers default
This command lists all producers registered to destinations belonging to the default JMS virtual provider. Note that this is the same virtual provider to which our destination belongs. From this list we can determine the producer ID, the destination to which the producer sends messages, its session ID and client ID. By the client ID, we can later on find out the consumer that is supposed to receive the messages. In this case, we look for producers registered that is supposed to receive the messages. In this case, we look for producers registered to the
<> destination. This is a way to determine if there is a currently active producer registered to our destination.

If there are messages pending to be delivered, then we have to continue our investigation with consumers that are expected to receive them. We can check the status of the JMS connection - how many bytes have been sent and received through it and when it was last accessed.

jms_list connections default
We use the client ID to check if there are any active connections and when for the last time was particular connection accessed. The JMS Virtual prodiver again has to be the same.
Note: If you want to find the corresponding connection to your consumer, you need the connection with client ID that is equal to the one of the already found consumer.

We can also check the status of the consumer registered to the
<> destination.

jms_list consumers default
This command lists all currently active consumers registered to destinations belonging to the default JMS virtual provider. From this list we can determine the consumer ID, the destination to which the consumer is registered, the session ID and the client ID. If there is no consumer registered to
<> then we know that our application does not receive messages because it failed for some reason to create the respective consumer and we can continue the investigation in this direction, for example by checking the server traces for relevant exceptions.

If there is an active consumer but it still does not receive any of the pending messages, it is possible that there is an issue in the application message processing logic which causes the messages to be redelivered again and again. By default, message delivery attempts are limited and once they are exhausted for a particular message, it is considered undeliverable (dead) and it is skipped by the consumer and moved to configured error destination of the original destination. To determine the error destination of the
<> destination, we have to use the configuration edition. In the Display configuration tab, expand Configurations -> jms_provider -> default -> queues -> <> -> Propertysheet data. In the Property Sheet you can find the error destination of a particular destination. In our case, the error destination of <> is sapDefaultErrorQueue.

Then, we can check if there are any messages in the error destination.

jms_list msg sapDefaultErrorQueue default
With this command we can check if the missing messages are present in the error destination.

If our application is unable to consume some of the messages, we have to check why and then we may want to do something with the undelivered messages. Since error destinations are just ordinary JMS destinations, you can access dead messages using the standard JMS API - for example, your application (or a dedicated tool) can consumer and process the messages from the error destination - it can even return them back to the original destination, if that is the error handling login of the application.

Note that we can configure the following properties on the jms-resources.xml related to the dead messages functionality.

a. deliveryAttemptsLimited - a Boolean property that indicates whether the message delivery
attempts are limited. The default value is "true".
b. maxDeliveryAttempts - an Integer property that indicates the maximum number of delivery
attempts before the message is considered undeliverable (dead). The default value is 5.
c. deliveryDelayInterval - the delay in milliseconds between two consecutive message delivery
attempts. The default value of this property is 2000 milliseconds.
d. errorDestination - the name of a JMS Queue where dead messages will be forwarded. If you
leave this property blank (“”), this means that you want dead messages to be discarded.

These four properties are configurable per JMS destination.

Note: The default error destination has an empty string for the errorDestination property, otherwise, when a message becomes dead in its original destination and then it also becomes dead in the error destination, this may lead to several transfers of this message through error destinations and potentially this may even lead to a message delivery endless loop.

Note: The value of the errorDestination property must be the name of an already existing Queue.