微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

Sql:优化BETWEEN子句

我写了一个声明,花了差不多一个小时才能运行所以我在寻求帮助,所以我可以更快地做到这一点.所以我们走了:

正在进行两个表的内连接:

我有许多时间间隔由间隔表示,我想从这些间隔内的度量获得测量数据.

interval:有两列,一列是开始时间,另一列是间隔的结束时间(行数= 1295)

度量:有两列,一列有度量,另一列有度量的时间(行数=一百万)

我想得到的结果是一个表,在第一列中有度量,然后是度量完成的时间,所考虑的时间间隔的开始/结束时间(对于时间在所考虑范围内的行,它将重复)

这是我的代码

select measures.measure as measure,measures.time as time,intervals.entry_time as entry_time,intervals.exit_time as exit_time
    from
    intervals
    inner join  
    measures
    on  intervals.entry_time<=measures.time  and measures.time <=intervals.exit_time  
    order by time asc

谢谢

解决方法

这是一个非常普遍的问题.

普通的B-Tree索引不适合这样的查询

SELECT  measures.measure as measure,intervals.exit_time as exit_time
FROM    intervals
JOIN    measures
ON      measures.time BETWEEN intervals.entry_time AND intervals.exit_time
ORDER BY
        time ASC

索引适用于搜索给定边界内的值,如下所示:

,但不是用于搜索包含给定值的边界,如下所示:

我的博客中的这篇文章更详细地解释了这个问题:

> Adjacency list vs. nested sets: MySQL

(嵌套集模型处理类似的谓词类型).

您可以按时间建立索引,这样区间将在连接中领先,范围时间将在嵌套循环内使用.这将需要按时分类.

您可以按间隔创建空间索引(在MysqL中使用MyISAM存储可用),其中包括一个几何列中的开始和结束.这样,测量可以导致连接,并且不需要排序.

然而,空间索引更慢,所以只有少量测量但间隔很多时,这才有效.

由于您只有很少的间隔但有很多度量,因此请确保您有一个measure.time的索引:

CREATE INDEX ix_measures_time ON measures (time)

更新:

这是一个要测试的示例脚本:

BEGIN
        DBMS_RANDOM.seed(20091223);
END;
/

CREATE TABLE intervals (
        entry_time NOT NULL,exit_time NOT NULL
)
AS
SELECT  TO_DATE('23.12.2009','dd.mm.yyyy') - level,TO_DATE('23.12.2009','dd.mm.yyyy') - level + DBMS_RANDOM.value
FROM    dual
CONNECT BY
        level <= 1500
/

CREATE UNIQUE INDEX ux_intervals_entry ON intervals (entry_time)
/

CREATE TABLE measures (
        time NOT NULL,measure NOT NULL
)
AS
SELECT  TO_DATE('23.12.2009','dd.mm.yyyy') - level / 720,CAST(DBMS_RANDOM.value * 10000 AS NUMBER(18,2))
FROM    dual
CONNECT BY
        level <= 1080000
/

ALTER TABLE measures ADD CONSTRAINT pk_measures_time PRIMARY KEY (time)
/

CREATE INDEX ix_measures_time_measure ON measures (time,measure)
/

这个查询

SELECT  SUM(measure),AVG(time - TO_DATE('23.12.2009','dd.mm.yyyy'))
FROM    (
        SELECT  *
        FROM    (
                SELECT  /*+ ORDERED USE_NL(intervals measures) */
                        *
                FROM    intervals
                JOIN    measures
                ON      measures.time BETWEEN intervals.entry_time AND intervals.exit_time
                ORDER BY
                        time
                )
        WHERE   rownum <= 500000
        )

使用nesTED LOOPS并在1.7秒内返回.

这个查询

SELECT  SUM(measure),'dd.mm.yyyy'))
FROM    (
        SELECT  *
        FROM    (
                SELECT  /*+ ORDERED USE_MERGE(intervals measures) */
                        *
                FROM    intervals
                JOIN    measures
                ON      measures.time BETWEEN intervals.entry_time AND intervals.exit_time
                ORDER BY
                        time
                )
        WHERE   rownum <= 500000
        )

使用MERGE JOIN,我不得不在5分钟后停止它.

更新2:

您很可能需要使用如下提示强制引擎在连接中使用正确的表顺序:

SELECT  /*+ LEADING (intervals) USE_NL(intervals,measures) */
        measures.measure as measure,intervals.exit_time as exit_time
FROM    intervals
JOIN    measures
ON      measures.time BETWEEN intervals.entry_time AND intervals.exit_time
ORDER BY
        time ASC

Oracle的优化器不够智能,看不到间隔不相交.这就是为什么它最有可能使用度量作为主导表(如果间隔相交,这将是一个明智的决定).

更新3:

WITH    splits AS
        (
        SELECT  /*+ MATERIALIZE */
                entry_range,exit_range,exit_range - entry_range + 1 AS range_span,entry_time,exit_time
        FROM    (
                SELECT  Trunc((entry_time - TO_DATE(1,'J')) * 2) AS entry_range,Trunc((exit_time - TO_DATE(1,'J')) * 2) AS exit_range,exit_time
                FROM    intervals
                )
        ),upper AS
        (
        SELECT  /*+ MATERIALIZE */
                MAX(range_span) AS max_range
        FROM    splits
        ),ranges AS
        (
        SELECT  /*+ MATERIALIZE */
                level AS chunk
        FROM    upper
        CONNECT BY
                level <= max_range
        ),tiles AS
        (
        SELECT  /*+ MATERIALIZE USE_MERGE (r s) */
                entry_range + chunk - 1 AS tile,exit_time
        FROM    ranges r
        JOIN    splits s
        ON      chunk <= range_span
        )
SELECT  /*+ LEADING(t) USE_HASH(m t) */
        SUM(LENGTH(stuffing))
FROM    tiles t
JOIN    measures m
ON      Trunc((m.time - TO_DATE(1,'J')) * 2) = tile
        AND m.time BETWEEN t.entry_time AND t.exit_time

查询将时间轴拆分为范围,并使用HASH JOIN连接范围值的度量和时间戳,稍后进行精细过滤.

有关其工作原理的更详细说明,请参阅我的博客中的这篇文章

> Oracle: joining timestamps and time intervals

原文地址:https://www.jb51.cc/mssql/78150.html

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。

相关推荐