微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

sql – 在创建索引并使用dbms_stats计算之后,查询执行速度较慢

我有一个150万行的桌子.我运行一个查询,它在列中获取具有非重复值的记录.我正在观察一个行为,其中创建索引后,查询性能下降.我还使用了百分之百估计百分比的dbms_stats(计算模式)
收集统计数据,以便oracle 11g CBO对查询计划做出更明智的决定,但不会改善查询执行时间.
sql> desc tab3;
 Name                 Null?    Type
 ----------------------------------------------
 COL1                          NUMBER(38)
 COL2                          VARCHAR2(100)
 COL3                          VARCHAR2(36)
 COL4                          VARCHAR2(36)
 COL5                          VARCHAR2(4000)
 COL6                          VARCHAR2(4000)
 MEASURE_0                     VARCHAR2(4000)
 MEASURE_1                     VARCHAR2(4000)
 MEASURE_2                     VARCHAR2(4000)
 MEASURE_3                     VARCHAR2(4000)
 MEASURE_4                     VARCHAR2(4000)
 MEASURE_5                     VARCHAR2(4000)
 MEASURE_6                     VARCHAR2(4000)
 MEASURE_7                     VARCHAR2(4000)
 MEASURE_8                     VARCHAR2(4000)
 MEASURE_9                     VARCHAR2(4000)

列measure_0具有40万个唯一值.

sql> select count(*) from (select measure_0 from tab3 group by measure_0 having count(*) = 1) abc;

  COUNT(*)
----------
    403664

以下是执行计划的查询,请注意,表上没有索引.

sql> set autotrace traceonly;

sql> SELECT * FROM (
  2     SELECT
  3             (ROWNUM -1) AS COL1,4             ft.COL1         AS OLD_COL1,5             ft.COL2,6             ft.COL3,7             ft.COL4,8             ft.COL5,9             ft.COL6,10             ft.MEASURE_0,11             ft.MEASURE_1,12             ft.MEASURE_2,13             ft.MEASURE_3,14             ft.MEASURE_4,15             ft.MEASURE_5,16             ft.MEASURE_6,17             ft.MEASURE_7,18             ft.MEASURE_8,19             ft.MEASURE_9
 20     FROM tab3 ft
 21     WHERE MEASURE_0 IN
 22      (
 23             SELECT MEASURE_0
 24             FROM tab3
 25             GROUP BY MEASURE_0
 26             HAVING COUNT(*) = 1
 27      )
 28  ) ABC WHERE COL1 >= 0 AND COL1 <=449;

450 rows selected.

Elapsed: 00:00:01.90

Execution Plan
----------------------------------------------------------
Plan hash value: 3115757351

------------------------------------------------------------------------------------
| Id  | Operation               | Name     | Rows  | Bytes | Cost (%cpu)| Time     |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |          |  1243 |    28M|   717K  (1)| 02:23:29 |
|*  1 |  VIEW                   |          |  1243 |    28M|   717K  (1)| 02:23:29 |
|   2 |   COUNT                 |          |       |       |            |          |
|*  3 |    HASH JOIN            |          |  1243 |    30M|   717K  (1)| 02:23:29 |
|   4 |     VIEW                | VW_NSO_1 |  1686K|  3219M|  6274   (2)| 00:01:16 |
|*  5 |      FILTER             |          |       |       |            |          |
|   6 |       HASH GROUP BY     |          |     1 |  3219M|  6274   (2)| 00:01:16 |
|   7 |        TABLE ACCESS FULL| TAB3     |  1686K|  3219M|  6196   (1)| 00:01:15 |
|   8 |     TABLE ACCESS FULL   | TAB3     |  1686K|    37G|  6211   (1)| 00:01:15 |
------------------------------------------------------------------------------------

Predicate information (identified by operation id):
---------------------------------------------------

   1 - filter("COL1">=0 AND "COL1"<=449)
   3 - access("MEASURE_0"="MEASURE_0")
   5 - filter(COUNT(*)=1)

Note
-----
   - dynamic sampling used for this statement (level=2)


Statistics
----------------------------------------------------------
        354  recursive calls
          0  db block gets
      46518  consistent gets
      45122  physical reads
          0  redo size
      43972  bytes sent via sql*Net to client
        715  bytes received via sql*Net from client
         31  sql*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        450  rows processed

查询占用1.90秒.如果我再次运行查询,则需要1.66秒.为什么在第一次运行中需要更多时间?

为了加快速度,我在查询中使用的两列中创建了索引.

sql> create index ind_tab3_orgid on tab3(COL1);

Index created.

Elapsed: 00:00:01.68
sql> create index ind_tab3_msr_0 on tab3(measure_0);

Index created.

Elapsed: 00:00:01.83

当我第一次发出这个查询之后,花了21秒才回来.而后续的运行则是2.9秒.为什么oracle在第一次运行中花费了这么多时间,是暖身还是什么东西呢?挡住了我!

这是计划,当它需要2.9秒 –

450 rows selected.

Elapsed: 00:00:02.92

Execution Plan
----------------------------------------------------------
Plan hash value: 240271480

-------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name           | Rows  | Bytes | Cost (%cpu)| Time     |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |                |  1243 |    28M|   711K  (1)| 02:22:15 |
|*  1 |  VIEW                          |                |  1243 |    28M|   711K  (1)| 02:22:15 |
|   2 |   COUNT                        |                |       |       |            |          |
|   3 |    nesTED LOOPS                |                |       |       |            |          |
|   4 |     nesTED LOOPS               |                |  1243 |    30M|   711K  (1)| 02:22:15 |
|   5 |      VIEW                      | VW_NSO_1       |  1686K|  3219M|  6274   (2)| 00:01:16 |
|*  6 |       FILTER                   |                |       |       |            |          |
|   7 |        HASH GROUP BY           |                |     1 |  3219M|  6274   (2)| 00:01:16 |
|   8 |         TABLE ACCESS FULL      | TAB3           |  1686K|  3219M|  6196   (1)| 00:01:15 |
|*  9 |      INDEX RANGE SCAN          | IND_TAB3_MSR_0 |  1243 |       |     2   (0)| 00:00:01 |
|  10 |     TABLE ACCESS BY INDEX ROWID| TAB3           |  1243 |    28M|    44   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------

Predicate information (identified by operation id):
---------------------------------------------------

   1 - filter("COL1">=0 AND "COL1"<=449)
   6 - filter(COUNT(*)=1)
   9 - access("MEASURE_0"="MEASURE_0")

Note
-----
   - dynamic sampling used for this statement (level=2)


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
     660054  consistent gets
      22561  physical reads
          0  redo size
      44358  bytes sent via sql*Net to client
        715  bytes received via sql*Net from client
         31  sql*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        450  rows processed

我期待的时间要低于桌子未被索引的时间.为什么表的索引版本比非索引版本需要更多的时间来获取结果?如果我没有错,那是正在占用时间的TABLE ACCESS BY INDEX ROWID.我可以强制执行oracle来使用TABLE ACCESS FULL吗?

然后我收集了表上的统计数据,以便CBO通过计算选项来改进计划.所以现在的统计数字是准确的.

sql> EXECUTE dbms_stats.gather_table_stats (ownname=>'EQUBE67DP',tabname=>'TAB3',estimate_percent=>null,cascade=>true);

PL/sql procedure successfully completed.

Elapsed: 00:01:02.47
sql> set autotrace off;
sql> select COLUMN_NAME,NUM_disTINCT,SAMPLE_SIZE,HISTOGRAM,LAST_ANALYZED from dba_tab_cols where table_name = 'TAB3' ;

COLUMN_NAME                    NUM_disTINCT SAMPLE_SIZE HISTOGRAM       LAST_ANALYZED
------------------------------ ------------ ----------- --------------- ---------
COL1                                1502257     1502257 NONE            27-JUN-12
COL2                                      0             NONE            27-JUN-12
COL3                                      1     1502257 NONE            27-JUN-12
COL4                                      0             NONE            27-JUN-12
COL5                                1502257     1502257 NONE            27-JUN-12
COL6                                1502257     1502257 NONE            27-JUN-12
MEASURE_0                            405609     1502257 HEIGHT BALANCED 27-JUN-12
MEASURE_1                            128570     1502257 NONE            27-JUN-12
MEASURE_2                           1502257     1502257 NONE            27-JUN-12
MEASURE_3                            185657     1502257 NONE            27-JUN-12
MEASURE_4                               901     1502257 NONE            27-JUN-12
MEASURE_5                                17     1502257 NONE            27-JUN-12
MEASURE_6                              2202     1502257 NONE            27-JUN-12
MEASURE_7                              2193     1502257 NONE            27-JUN-12
MEASURE_8                                21     1502257 NONE            27-JUN-12
MEASURE_9                             27263     1502257 NONE            27-JUN-12

我再次运行查询

450 rows selected.

Elapsed: 00:00:02.95

Execution Plan
----------------------------------------------------------
Plan hash value: 240271480

-------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name           | Rows  | Bytes | Cost (%cpu)| Time     |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |                |    31M|   718G|  8046   (2)| 00:01:37 |
|*  1 |  VIEW                          |                |    31M|   718G|  8046   (2)| 00:01:37 |
|   2 |   COUNT                        |                |       |       |            |          |
|   3 |    nesTED LOOPS                |                |       |       |            |          |
|   4 |     nesTED LOOPS               |                |    31M|    62G|  8046   (2)| 00:01:37 |
|   5 |      VIEW                      | VW_NSO_1       |  4057 |  7931K|  6263   (2)| 00:01:16 |
|*  6 |       FILTER                   |                |       |       |            |          |
|   7 |        HASH GROUP BY           |                |     1 | 20285 |  6263   (2)| 00:01:16 |
|   8 |         TABLE ACCESS FULL      | TAB3           |  1502K|  7335K|  6193   (1)| 00:01:15 |
|*  9 |      INDEX RANGE SCAN          | IND_TAB3_MSR_0 |     4 |       |     2   (0)| 00:00:01 |
|  10 |     TABLE ACCESS BY INDEX ROWID| TAB3           |   779K|    75M|     3   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------

Predicate information (identified by operation id):
---------------------------------------------------

   1 - filter("COL1">=0 AND "COL1"<=449)
   6 - filter(COUNT(*)=1)
   9 - access("MEASURE_0"="MEASURE_0")


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
     660054  consistent gets
      22561  physical reads
          0  redo size
      44358  bytes sent via sql*Net to client
        715  bytes received via sql*Net from client
         31  sql*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        450  rows processed

这次查询在2.9秒内回来了(有时也需要3.9秒).

我的目标是尽可能减少查询执行时间.但是在添加索引或计算统计信息后,查询时间不断增加.为什么会发生这种情况,即使保持索引也可以改进?

解决方法

首先让我引用 Tom Kyte

just keep saying to yourself over and over

“full scans are not evil,indexes are not good”
“full scans are not evil,indexes are not good”

索引不会总是提高性能,它们不是魔法银弹(好像这样的事情存在:)

现在你问你的索引为什么需要更长的时间.答案很简单:

>全表扫描:46518一致得到
>与你的索引:660054一致得到

换句话说:Oracle使用索引执行更多的读取操作,而不是全表扫描.这是因为:

>全表扫描读取是批量操作(一次有多个块),因此是读取大量数据的有效方法
>有时当您从索引中读取时,您最终会多次读取完全相同的数据块.

至于优化器为什么选择使用这个非常有效的索引,这是因为即使使用esimate_percent = 100和完整的直方图(您在MEASURE_0列上收集的),一些数据分布仍然无法通过简单的分析来可靠地表达的优化器.特别地,分析器不能很好地理解交叉列和跨表依赖性.这导致错误的估计,导致计划选择不佳.

编辑:似乎CBO的工作假设对于这个自连接来说根本不工作(你最后一个查询预计有31万行,而只有450个被选中!).这是令人困惑的,因为表只有1.5 M行.你使用什么版本的Oracle?

我想您会发现您可以删除自连接,从而通过分析来提高查询性能

SELECT * FROM (
   SELECT (ROWNUM -1) AS COL1,ABC.*
     FROM (
      SELECT
              ft.COL1 AS OLD_COL1,[...],COUNT(*) OVER (PARTITION BY MEASURE_O) nb_0
      FROM tab3 ft
      ) ABC 
    WHERE nb_0 = 1
      AND ROWNUM - 1 <= 449
   ) v
 WHERE COL1 >= 0;

你也问为什么第一次查询运行需要更多的时间.这是因为在工作中有缓存.在数据库级别,存在SGA,其中所有块首先从磁盘复制,然后可以多次读取(第一次查询块始终是物理读取).那么一些系统还有一个独立的系统缓存,如果最近读过数据缓存,将会更快地返回数据.

进一步阅读:

> explication of a data discrepancy that can lead to wrong estimates和涉及sql配置文件解决方案.

原文地址:https://www.jb51.cc/mssql/75499.html

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。

相关推荐