请注意,本文编写于 587 天前,最后修改于 583 天前,其中某些信息可能已经过时。
环境说明:
DB:Oracle 11.2.0.4.0
OS:Redhat 7.6
问题现象:
数据库告警日志每天会有大量的ORA-32701错误,内容如下:
Sat Apr 03 22:09:48 2021
Errors in file /oracle/db/diag/rdbms/sytrnt/trnt1/trace/trnt1_dia0_126699.trc (incident=264060):
ORA-32701: Possible hangs up to hang ID=34 detected
DIA0 terminating blocker (ospid: 160308 sid: 2960 ser#: 8539) of hang with ID = 34
requested by master DIA0 process on instance 1
Hang Resolution Reason: Although the number of affected sessions did not
justify automatic hang resolution initially, this previously ignored
hang was automatically resolved.
by terminating the process ospid:160308
影响范围:
AWR快照无法自动生成和手动生成,
尝试手动收集AWR快照,卡住很久,对应等待事件为"enq: WF - contention";
EXECUTE DBMS_WORKLOAD_REPOSITORY.CREATE_SNAPSHOT();
Oracle默认每小时生成一次快照,11g中,默认保存时间为8天,而在10g中,默认保存时间为7天
select * from dba_hist_wr_control;
查看快照生成情况:
col BEGIN_INTERVAL_TIME for a50
col FLUSH_ELAPSED for a30
select SNAP_ID,BEGIN_INTERVAL_TIME,FLUSH_ELAPSED from dba_hist_snapshot order by snap_id;
问题分析:
查看trnt1_dia0_126699.trc日志:
Incident 264050 created, dump file: /oracle/db/diag/rdbms/sytrnt/trnt1/incident/incdir_264050/trnt1_dia0_126699_i264050.trc
ORA-32701: Possible hangs up to hang ID=29 detected
查看trnt1_dia0_126699_i264050.trc日志:
*** 2021-04-03 17:09:39.961
Resolvable Hangs in the System
Root Chain Total Hang
Hang Hang Inst Root #hung #hung Hang Hang Resolution
ID Type Status Num Sess Sess Sess Conf Span Action
----- ---- -------- ---- ----- ----- ----- ------ ------ -------------------
29 HANG RSLNPEND 1 2960 2 2 HIGH GLOBAL Terminate Process
Hang Resolution Reason: Although the number of affected sessions did not
justify automatic hang resolution initially, this previously ignored
hang was automatically resolved.
Previous SESSION termination was unsuccessful. PROCESS termination
will be attempted.
inst# SessId Ser# OSPID PrcNm Event
----- ------ ----- --------- ----- -----
2 3162 21973 139799 M000 enq: WF - contention
1 2960 3479 151457 M000 not in wait
查看当前SQL:
*** 2021-04-03 17:09:40.084
current sql: insert into wrh$_sql_bind_metadata (snap_id, dbid, sql_id, name, position, dup_position, datatype, datatype_string, character_sid, precision, scale, max_length) SELECT /*+ ordered use_nl(bnd) index(bnd sql_id) */ :lah_snap_id, :dbid, bnd.sql_id, name, position, dup_position, datatype, dataty
完整SQL如下:
insert into wrh$_sql_bind_metadata
(snap_id,
dbid,
sql_id,
name,
position,
dup_position,
datatype,
datatype_string,
character_sid,
precision,
scale,
max_length)
SELECT /*+ ordered use_nl(bnd) index(bnd sql_id) */
:lah_snap_id,
:dbid,
bnd.sql_id,
name,
position,
dup_position,
datatype,
datatype_string,
character_sid,
precision,
scale,
max_length
FROM x$kewrattrnew new, x$kewrsqlidtab tab, v$sql_bind_capture bnd
WHERE new.str1_kewrattr = tab.s;
查看SQL对应执行计划如下:
13Plan hash value: 4222011306
14
15----------------------------------------------------------------------------------------------
16| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
17----------------------------------------------------------------------------------------------
18| 0 | INSERT STATEMENT | | | | 2 (100)| |
19| 1 | LOAD TABLE CONVENTIONAL | | | | | |
20| 2 | NESTED LOOPS | | 100 | 11500 | 2 (100)| 00:00:01 |
21|* 3 | HASH JOIN | | 100 | 4800 | 0 (0)| |
22| 4 | FIXED TABLE FULL | X$KEWRATTRNEW | 100 | 3400 | 0 (0)| |
23| 5 | FIXED TABLE FULL | X$KEWRSQLIDTAB | 100 | 1400 | 0 (0)| |
24|* 6 | FIXED TABLE FIXED INDEX| X$KQLFBC (ind:2) | 1 | 67 | 0 (0)| |
25----------------------------------------------------------------------------------------------
26
27Predicate Information (identified by operation id):
28---------------------------------------------------
29
30 3 - access("NEW"."STR1_KEWRATTR"="TAB"."SQLID_KEWRSIE")
31 6 - filter(("INST_ID"=USERENV('INSTANCE') AND "TAB"."SQLID_KEWRSIE"="KQLFBC_SQLID"
32 AND "TAB"."CHILDADDR_KEWRSIE"="KQLFBC_CADD"))
查看X$KQLFBC数据量很大:
select count(*) from X$KQLFBC; ---长时间无返回结果
查看MOS:
Error ORA-32701 'On Current SQL: insert into wrh$_sql_bind_metadata' (文档 ID 2226216.1)
原因:
CAUSE
视图v$sqlbind_capture对应于固定表X$KQLFBC,该表主要用于存储与数据绑定相关的变量。
View v$sqlbind_capture corresponds to fixed table X$KQLFBC table which is mainly used to store variables associated with the binding of data.
在使用大量绑定变量的大型数据库中可以注意到此错误。
This error can be noticed in large databases using large amount of binding variables.
解决方案:
- Collect statistics on following fixed table:
exec dbms_stats.gather_table_stats('SYS', 'X$KEWRATTRNEW');
exec dbms_stats.gather_table_stats('SYS', 'X$KEWRSQLIDTAB');
立即生效
exec dbms_stats.gather_table_stats(ownname => 'SYS',tabname => 'X$KEWRATTRNEW',no_invalidate => FALSE);
exec dbms_stats.gather_table_stats(ownname => 'SYS',tabname => 'X$KEWRSQLIDTAB',no_invalidate => FALSE);
经测试,收集统计信息后,并没有解决问题,AWR还是无法生成快照。
Or
2.重新启动数据库将释放X$KQLFBC表数据(需要停机窗口,谨慎操作) - Restarting the database will release of X$KQLFBC table data
Or
3.定期刷新共享池(需要停机窗口,谨慎操作) - Flush shared_pool on a regular basis
alter system flush shared_pool;
Or
4.针对数据量较大的基表,也可以通过设置参数来屏蔽相关数据写入到awr中。
alter system set "_awr_disabled_flush_tables" = 'wrh$_sql_bind_metadata';
其中 wrh$_sql_bind_metadata 可以替换成其他导致 AWR 无法正常完成的收集任务的基表名称。
转载自chenoracle