SAP Basis Procedures / Questions and Answers: January 2011

I faced Oracle Database block corruption yesterday and posting the recovery steps taken to survive the crash.

The following filesystems need to be mounted with options (rw,_netdev,noatime,datavolume,nointr,heartbeat=local)
/dev/mapper/vg_RP1_dataHDS-oracle_RP1_sapdata39 on /oracle/RP1/sapdata39 type ocfs2 (rw,_netdev,heartbeat=local)
/dev/mapper/vg_RP1_dataHDS-oracle_RP1_sapdata40 on /oracle/RP1/sapdata40 type ocfs2 (rw,_netdev,heartbeat=local)
/dev/mapper/vg_RP1_dataHDS-oracle_RP1_sapdata41 on /oracle/RP1/sapdata41 type ocfs2 (rw,_netdev,heartbeat=local)
/dev/mapper/vg_RP1_dataHDS-oracle_RP1_sapdata42 on /oracle/RP1/sapdata42 type ocfs2 (rw,_netdev,heartbeat=local)
/dev/mapper/vg_RP1_dataHDS-oracle_RP1_sapdata43 on /oracle/RP1/sapdata43 type ocfs2 (rw,_netdev,heartbeat=local)
/dev/mapper/vg_RP1_dataHDS-oracle_RP1_sapdata44 on /oracle/RP1/sapdata44 type ocfs2 (rw,_netdev,heartbeat=local)

Because of which high I/O and contention caused all five of the Oracle RAC nodes to crash. Also resulting in datafile recovery.
But, recover datafile was failing because of inconsistent redolog with datablock.
So, the following steps are taking in order to get the Production system out of danger.

General recommendation for mounting the ocfs2 filesystems in Oracle RAC environment is with the following options.
rw,_netdev,noatime,datavolume,nointr,heartbeat=local.

Remounted the problematic filesystems with the above options. And now, going ahead with the analysis and recovery steps.

SQL> startup pfile=initRP11.ora
ORA-00445: background process "PSP0" did not start after 120 seconds

Probably due to high number of db writer processes and large sga.

SQL> select distinct status from v$backup;
STATUS
------------------
NOT ACTIVE

SQL> set NUMWIDTH 13

SQL> select distinct checkpoint_change# from v$datafile;
CHECKPOINT_CHANGE#
------------------
     100242078361
     100243048128

SQL> select checkpoint_change#,count(*) from v$datafile group by checkpoint_change#;
CHECKPOINT_CHANGE#      COUNT(*)
------------------ -------------
     100242078361             1
     100243048128           629

SQL> select tablespace_name , status from dba_tablespaces ;
TABLESPACE_NAME                STATUS
------------------------------ ---------
SYSTEM                         ONLINE
PSAPUNDO_5                     ONLINE
SYSAUX                         ONLINE
PSAPTEMP                       ONLINE
PERFMAN                        ONLINE
PSAPR3E620                     ONLINE
PSAPR3EUSR                     ONLINE
PSAPR3E                        ONLINE
PSAPUNDO_1                     ONLINE
PSAPUNDO_2                     ONLINE
PSAPUNDO_3                     ONLINE
PSAPUNDO_4                     ONLINE

SQL> select status,count(*) from dba_data_files group by status;
STATUS         COUNT(*)
--------- -------------
AVAILABLE           630

SQL> select file# from v$datafile where checkpoint_change#=100242078361;
       FILE#
-------------
         628

SQL> select * from dba_data_files where file_id=628;
FILE_NAME
--------------------------------------------------------------------------------
     FILE_ID TABLESPACE_NAME                        BYTES        BLOCKS
------------- ------------------------------ ------------- -------------
STATUS     RELATIVE_FNO AUT      MAXBYTES     MAXBLOCKS INCREMENT_BY
--------- ------------- --- ------------- ------------- -------------
USER_BYTES   USER_BLOCKS ONLINE_
------------- ------------- -------
/oracle/RP1/sapdata40/r3e_554/r3e.data554
         628 PSAPR3E
AVAILABLE           628
                           RECOVER

SQL> select count(*) from dba_extents where file_id=628;
COUNT(*)
----------
       16

SQL> select name from v$datafile where status ='RECOVER';
NAME
--------------------------------------------------------------------------------
/oracle/RP1/sapdata40/r3e_554/r3e.data554

SQL> recover datafile '/oracle/RP1/sapdata40/r3e_554/r3e.data554' ;
ORA-00283: recovery session canceled due to errors
ORA-12801: error signaled in parallel query server P003, instance
DBHOST:RP11 (1)
ORA-00600: internal error code, arguments: [3020], [628], [128460],
[2634151372], [], [], [], []
ORA-10567: Redo is inconsistent with data block (file# 628, block# 128460)
ORA-10564: tablespace PSAPR3E
ORA-01110: data file 628: '/oracle/RP1/sapdata40/r3e_554/r3e.data554'
ORA-10561: block type 'TRANSACTION MANAGED INDEX BLOCK', data object# 99304

Alert_RP11.log
Fri Jan 7 17:15:19 2011
Recovery of Online Redo Log: Thread 2 Group 23 Seq 6211 Reading mem 0
Mem# 0: /oracle/RP1/origlogA/LOG_G23M1.DBF
Mem# 1: /oracle/RP1/mirrlogA/LOG_G23M2.DBF
Fri Jan 7 17:16:34 2011
Errors in file /oracle/RP1/saptrace/background/rp11_p003_6215.trc:
ORA-00600: internal error code, arguments: [3020], [628], [128460], [2634151372]                                                                                        , [], [], [], []
ORA-10567: Redo is inconsistent with data block (file# 628, block# 128460)
ORA-10564: tablespace PSAPR3E
ORA-01110: data file 628: '/oracle/RP1/sapdata40/r3e_554/r3e.data554'
ORA-10561: block type 'TRANSACTION MANAGED INDEX BLOCK', data object# 99304
Fri Jan 7 17:16:34 2011
Errors in file /oracle/RP1/saptrace/background/rp11_p007_6229.trc:
ORA-00600: internal error code, arguments: [3020], [628], [126171], [2634149083]                                                                                        , [], [], [], []
ORA-10567: Redo is inconsistent with data block (file# 628, block# 126171)
ORA-10564: tablespace PSAPR3E
ORA-01110: data file 628: '/oracle/RP1/sapdata40/r3e_554/r3e.data554'
ORA-10561: block type 'TRANSACTION MANAGED INDEX BLOCK', data object# 99304
Fri Jan 7 17:16:34 2011
Errors in file /oracle/RP1/saptrace/background/rp11_p007_6229.trc:
ORA-00600: internal error code, arguments: [3020], [628], [126171], [2634149083]                                                                                        , [], [], [], []
ORA-10567: Redo is inconsistent with data block (file# 628, block# 126171)
ORA-10564: tablespace PSAPR3E
ORA-01110: data file 628: '/oracle/RP1/sapdata40/r3e_554/r3e.data554'
ORA-10561: block type 'TRANSACTION MANAGED INDEX BLOCK', data object# 99304
Fri Jan 7 17:16:34 2011
Errors in file /oracle/RP1/saptrace/background/rp11_p003_6215.trc:
ORA-00600: internal error code, arguments: [3020], [628], [128460], [2634151372]                                                                                        , [], [], [], []
ORA-10567: Redo is inconsistent with data block (file# 628, block# 128460)
ORA-10564: tablespace PSAPR3E
ORA-01110: data file 628: '/oracle/RP1/sapdata40/r3e_554/r3e.data554'
ORA-10561: block type 'TRANSACTION MANAGED INDEX BLOCK', data object# 99304
Fri Jan 7 17:16:35 2011
Trace dumping is performing id=[cdmp_20110107171635]
Fri Jan 7 17:16:42 2011
Media Recovery failed with error 12801
ORA-283 signalled during: ALTER DATABASE RECOVER datafile '/oracle/RP1/sapdata40/r3e_554/r3e.data554'   ...

SQL> SELECT tablespace_name, segment_type, owner, segment_name FROM dba_extents WHERE file_id = 628 and 128460 between block_id AND block_id + blocks - 1;

Run DB Verify dbv file=r3e.data554 FEEDBACK=100 but no guarantee that DB Verify will be 100% dependable.

DBVERIFY - Verification complete
Total Pages Examined         : 3840000
Total Pages Processed (Data) : 48992
Total Pages Failing   (Data) : 0
Total Pages Processed (Index): 78245
Total Pages Failing   (Index): 0
Total Pages Processed (Other): 144
Total Pages Processed (Seg) : 0
Total Pages Failing   (Seg) : 0
Total Pages Empty            : 3712619
Total Pages Marked Corrupt   : 0
Total Pages Influx           : 0
Highest block SCN            : 1458725156 (23.1458725156)

Find the object in the corrupt block
SQL> desc obj$;
Name                                      Null?    Type
----------------------------------------- -------- ----------------------------
OBJ#                                      NOT NULL NUMBER
DATAOBJ#                                           NUMBER
OWNER#                                    NOT NULL NUMBER
NAME                                      NOT NULL VARCHAR2(30)
NAMESPACE                                 NOT NULL NUMBER
SUBNAME                                            VARCHAR2(30)
TYPE#                                     NOT NULL NUMBER
CTIME                                     NOT NULL DATE
MTIME                                     NOT NULL DATE
STIME                                     NOT NULL DATE
STATUS                                    NOT NULL NUMBER
REMOTEOWNER                                        VARCHAR2(30)
LINKNAME                                           VARCHAR2(128)
FLAGS                                              NUMBER
OID$                                               RAW(16)
SPARE1                                             NUMBER
SPARE2                                             NUMBER
SPARE3                                             NUMBER
SPARE4                                             VARCHAR2(1000)
SPARE5                                             VARCHAR2(1000)
SPARE6                                             DATE

SQL> elect name from obj$ where obj#=99304 ;
NAME
------------------------------
GLPCA~0

SQL> SELECT tablespace_name, segment_type, owner, segment_name FROM dba_extents WHERE file_id = 628 and 128460 between block_id AND block_id + blocks - 1;
TABLESPACE_NAME                SEGMENT_TYPE       OWNER
------------------------------ ------------------ ------------------------------
SEGMENT_NAME
--------------------------------------------------------------------------------
PSAPR3E                        INDEX              SAPR3E
GLPCA~0

SELECT tablespace_name, segment_type, owner, segment_name FROM dba_extents WHERE file_id = 628 and 126171 between block_id AND block_id + blocks - 1;

So it's an index block. I think what needs to be done is to run the recover clause with option "test" and then check in the syslog the number of corruptions

Run a test recovery
SQL> recover datafile '/oracle/RP1/sapdata40/r3e_554/r3e.data554' test;
ORA-10589: Test recovery had to corrupt 2 data blocks in order to proceed
ORA-10573: Test recovery tested redo from change 100242078361 to 100242999965
ORA-10572: Test recovery canceled due to errors
ORA-00600: internal error code, arguments: [3019], [2], [6211], [1232097],
[23], [1458752157], [23], [1458752153]

Alert_RP11.log

Fri Jan 7 17:57:47 2011
CORRUPTING BLOCK 126171 OF FILE 628 AND CONTINUING RECOVERY
Fri Jan 7 17:57:47 2011
Errors in file /oracle/RP1/saptrace/usertrace/rp11_ora_31850.trc:
ORA-10567: Redo is inconsistent with data block (file# 628, block# 126171)
ORA-10564: tablespace PSAPR3E
ORA-01110: data file 628: '/oracle/RP1/sapdata40/r3e_554/r3e.data554'
RA-10561: block type 'TRANSACTION MANAGED INDEX BLOCK', data object# 99304
Fri Jan 7 17:57:47 2011
Errors in file /oracle/RP1/saptrace/usertrace/rp11_ora_31850.trc:
ORA-00600: internal error code, arguments: [3019], [2], [6211], [1232097], [23], [1458752157], [23], [1458752153]
Fri Jan 7 17:57:49 2011
Test recovery canceled due to errors
Test recovery tested redo from change 100242078361 to 100242999965
Test recovery found 2 problems
ORA-10589 signalled during: ALTER DATABASE RECOVER datafile '/oracle/RP1/sapdata40/r3e_554/r3e.data554' test ..
Fri Jan 7 17:57:49 2011
Trace dumping is performing id=[cdmp_20110107175749]

SQL> select segment_name, segment_type from dba_extents where file_id=628 order by segment_name;
SEGMENT_NAME
--------------------------------------------------------------------------------
SEGMENT_TYPE
------------------
BSIS~0
INDEX

CE11000
TABLE

GLPCA
TABLE

SEGMENT_NAME
--------------------------------------------------------------------------------
SEGMENT_TYPE
------------------
GLPCA~0
INDEX

GLPCA~1
INDEX

GLPCA~2
INDEX

SEGMENT_NAME
--------------------------------------------------------------------------------
SEGMENT_TYPE
------------------
IBINVALUES~SEL
INDEX

IBINVALUES~SWO
INDEX

SOFFCONT1
TABLE

SEGMENT_NAME
--------------------------------------------------------------------------------
SEGMENT_TYPE
------------------
SWWLOGHIST
TABLE

SWWLOGHIST
TABLE

SWWLOGHIST
TABLE

SEGMENT_NAME
--------------------------------------------------------------------------------
SEGMENT_TYPE
------------------
SWWLOGHIST~0
INDEX

SWWLOGHIST~0
INDEX

SWWLOGHIST~1
INDEX

SEGMENT_NAME
--------------------------------------------------------------------------------
SEGMENT_TYPE
------------------
SWWLOGHIST~Z01
INDEX

16 rows selected.

The datafile 628 contains only few extents. If we are lucky maybe these are only indexes?
We could ignore this if only indexes were affected.

SQL> SELECT tablespace_name, segment_type, owner, segment_name FROM dba_extents WHERE file_id = 628 and 126171 between block_id AND block_id + blocks - 1;
TABLESPACE_NAME                SEGMENT_TYPE       OWNER
------------------------------ ------------------ ------------------------------
SEGMENT_NAME
--------------------------------------------------------------------------------
PSAPR3E                        INDEX              SAPR3E
GLPCA~0

SQL> recover datafile '/oracle/RP1/sapdata40/r3e_554/r3e.data554' allow 1 corruption;
SQL> recover datafile '/oracle/RP1/sapdata40/r3e_554/r3e.data554' allow 1 corruption;

Rebuild the corrupted index.

SQL> alter index sapr3e."GLPCA~0" rebuild online parallel 8;

After the rebuild the datafile will continue to contain the two corrupt blocks. But they will not be addressed.
They will sooner or later be gone, when they are overwritten. After the rebuild is finished there is no further impact.
Except of the fact that a db verify will show these two blocks as corrupt. It could be that you see it *now*, i.e. after the recovery.

SAP Basis Procedures / Questions and Answers

Saturday, January 15, 2011

Sending HTML content using sendmail in Linux

Shell Script to convert a text file to html file

Saturday, January 8, 2011

Oracle Recovery – Data Corruption – Bad blocks

Sunday, January 2, 2011

Firefox Tip: Access SAP Notes Easily

Blog Archive

Followers