微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

是否使用索引查询显然取决于表大小

如何解决是否使用索引查询显然取决于表大小

我有一个表存储天气数据的时间序列。

                           Table « public.weather_data »
   Colonne   |           Type           | Collationnement | NULL-able | Par défaut 
-------------+--------------------------+-----------------+-----------+------------
 timestamp   | timestamp with time zone |                 | not null  | 
 location_id | integer                  |                 | not null  | 
 type_id     | integer                  |                 | not null  | 
 value       | double precision         |                 |           | 
Index :
    "weather_data_pkey" PRIMARY KEY,btree (location_id,"timestamp",type_id)
Contraintes de clés étrangères :
    "weather_data_location_id_fkey" FOREIGN KEY (location_id) REFERENCES locations(id)
    "weather_data_type_id_fkey" FOREIGN KEY (type_id) REFERENCES weather_data_types(id)
Triggers :
    ts_insert_blocker BEFORE INSERT ON weather_data FOR EACH ROW EXECUTE PROCEDURE _timescaledb_internal.insert_blocker()

我想在一个时间间隔内查询一个位置和一个类型的值。

在我的开发环境中,数据量很少

SELECT * FROM hypertable_detailed_size('weather_data');
 table_bytes | index_bytes | toast_bytes | total_bytes | node_name 
-------------+-------------+-------------+-------------+-----------
    12902400 |     9887744 |           0 |    22790144 | 

查询使用索引

EXPLAIN SELECT weather_data.timestamp AS anon_1,weather_data.value AS weather_data_value FROM weather_data WHERE 31 = weather_data.location_id AND weather_data.timestamp >= '2000-01-01' AND weather_data.timestamp < '2020-02-02' AND weather_data.type_id = 1 ORDER BY weather_data.timestamp;
 Custom Scan (ChunkAppend) on weather_data  (cost=0.28..100.10 rows=53 width=16)
   Order: weather_data."timestamp"
   ->  Index Scan using "7790_23369_weather_data_pkey" on _hyper_49_7790_chunk  (cost=0.28..1.88 rows=1 width=16)
         Index Cond: ((31 = location_id) AND ("timestamp" >= '2000-01-01 00:00:00+01'::timestamp with time zone) AND ("timestamp" < '2020-02-02 00:00:00+01'::timestamp with time zone) AND (type_id = 1))
   ->  Index Scan using "7791_23372_weather_data_pkey" on _hyper_49_7791_chunk  (cost=0.28..1.89 rows=1 width=16)
         Index Cond: ((31 = location_id) AND ("timestamp" >= '2000-01-01 00:00:00+01'::timestamp with time zone) AND ("timestamp" < '2020-02-02 00:00:00+01'::timestamp with time zone) AND (type_id = 1))
   [...]

在我的生产环境中,存储了更多数据

SELECT * FROM hypertable_detailed_size('weather_data');
 table_bytes | index_bytes | toast_bytes | total_bytes | node_name 
-------------+-------------+-------------+-------------+-----------
 24146182144 | 34957369344 |           0 | 59103551488 | 

索引似乎没有被使用。

 Sort  (cost=835599.15..836053.40 rows=181699 width=16)
   Sort Key: _hyper_1_33_chunk."timestamp"
   ->  Append  (cost=157.08..816618.66 rows=181699 width=16)
         ->  Bitmap Heap Scan on _hyper_1_33_chunk  (cost=157.08..560.33 rows=124 width=16)
               Recheck Cond: ((31 = location_id) AND ("timestamp" >= '2000-01-01 00:00:00+01'::timestamp with time zone) AND ("timestamp" < '2020-02-02 00:00:00+01'::timestamp with time zone) AND (type_id = 1))
               ->  Bitmap Index Scan on "33_98_weather_data_pkey"  (cost=0.00..157.05 rows=124 width=0)
                     Index Cond: ((31 = location_id) AND ("timestamp" >= '2000-01-01 00:00:00+01'::timestamp with time zone) AND ("timestamp" < '2020-02-02 00:00:00+01'::timestamp with time zone) AND (type_id = 1))
         ->  Bitmap Heap Scan on _hyper_1_34_chunk  (cost=263.76..878.92 rows=198 width=16)
               Recheck Cond: ((31 = location_id) AND ("timestamp" >= '2000-01-01 00:00:00+01'::timestamp with time zone) AND ("timestamp" < '2020-02-02 00:00:00+01'::timestamp with time zone) AND (type_id = 1))
         [...]

我能想到的唯一区别是数据量。

还有别的吗?

我知道,如果查询可能以另一种方式更快,则可能需要权衡导致引擎不使用索引,但这里似乎并非如此。


我尝试添加一个新索引以进一步匹配查询

                           Table « public.weather_data »
   Colonne   |           Type           | Collationnement | NULL-able | Par défaut 
-------------+--------------------------+-----------------+-----------+------------
 timestamp   | timestamp with time zone |                 | not null  | 
 location_id | integer                  |                 | not null  | 
 type_id     | integer                  |                 | not null  | 
 value       | double precision         |                 |           | 
Index :
    "weather_data_pkey" PRIMARY KEY,type_id)
    "weather_data_location_id_type_id_timestamp_idx" UNIQUE,type_id,"timestamp")

显然,索引大小变大了:

SELECT * FROM hypertable_detailed_size('weather_data');
 table_bytes | index_bytes | toast_bytes | total_bytes | node_name 
-------------+-------------+-------------+-------------+-----------
 24146182144 | 49657004032 |           0 | 73803186176 | 

而且这个索引似乎没有多大帮助。

 Sort  (cost=604117.07..604571.26 rows=181676 width=16)
   Sort Key: _hyper_1_33_chunk."timestamp"
   ->  Append  (cost=6.31..585138.76 rows=181676 width=16)
         ->  Bitmap Heap Scan on _hyper_1_33_chunk  (cost=6.31..409.57 rows=124 width=16)
               Recheck Cond: ((31 = location_id) AND (type_id = 1) AND ("timestamp" >= '2000-01-01 00:00:00+01'::timestamp with time zone) AND ("timestamp" < '2020-02-02 00:00:00+01'::timestamp with time zone))
               ->  Bitmap Index Scan on _hyper_1_33_chunk_weather_data_location_id_type_id_timestamp_id  (cost=0.00..6.28 rows=124 width=0)
                     Index Cond: ((31 = location_id) AND (type_id = 1) AND ("timestamp" >= '2000-01-01 00:00:00+01'::timestamp with time zone) AND ("timestamp" < '2020-02-02 00:00:00+01'::timestamp with time zone))
         ->  Bitmap Heap Scan on _hyper_1_34_chunk  (cost=7.44..622.60 rows=198 width=16)
               Recheck Cond: ((31 = location_id) AND (type_id = 1) AND ("timestamp" >= '2000-01-01 00:00:00+01'::timestamp with time zone) AND ("timestamp" < '2020-02-02 00:00:00+01'::timestamp with time zone))
               ->  Bitmap Index Scan on _hyper_1_34_chunk_weather_data_location_id_type_id_timestamp_id  (cost=0.00..7.39 rows=198 width=0)
                     Index Cond: ((31 = location_id) AND (type_id = 1) AND ("timestamp" >= '2000-01-01 00:00:00+01'::timestamp with time zone) AND ("timestamp" < '2020-02-02 00:00:00+01'::timestamp with time zone))

知道为什么索引不允许在我的生产环境中进行快速查询以及我可以做些什么吗?


编辑EXPLAIN (ANALYZE,BUFFERS)。还修复查询以确保两个数据库返回相同的数据。在我上面的查询中,prod DB 会返回更多数据(实际上,dev DB 不会返回任何数据)。现在查询一整年的每小时数据(即 8760 行)。

开发:

 Custom Scan (ChunkAppend) on weather_data  (cost=0.42..36800.07 rows=9371 width=16) (actual time=0.033..11.025 rows=8760 loops=1)
   Order: weather_data."timestamp"
   Buffers: shared hit=1672
   ->  Index Scan using _hyper_1_137_chunk_weather_data_location_id_type_id_timestamp_i on _hyper_1_137_chunk  (cost=0.42..225.73 rows=55 width=16) (actual time=0.031..0.078 rows=49 loops=1)
         Index Cond: ((32 = location_id) AND (type_id = 1) AND ("timestamp" >= '2002-01-01 00:00:00+01'::timestamp with time zone) AND ("timestamp" < '2003-01-01 00:00:00+01'::timestamp with time zone))
         Buffers: shared hit=12
   ->  Index Scan using _hyper_1_138_chunk_weather_data_location_id_type_id_timestamp_i on _hyper_1_138_chunk  (cost=0.42..801.22 rows=205 width=16) (actual time=0.017..0.181 rows=168 loops=1)
         Index Cond: ((32 = location_id) AND (type_id = 1) AND ("timestamp" >= '2002-01-01 00:00:00+01'::timestamp with time zone) AND ("timestamp" < '2003-01-01 00:00:00+01'::timestamp with time zone))
         Buffers: shared hit=32

关闭 bitmapscan 的生产,与开发相同:

 Custom Scan (ChunkAppend) on weather_data  (cost=0.42..36800.07 rows=9371 width=16) (actual time=0.033..11.086 rows=8760 loops=1)
   Order: weather_data."timestamp"
   Buffers: shared hit=1672
   ->  Index Scan using _hyper_1_137_chunk_weather_data_location_id_type_id_timestamp_i on _hyper_1_137_chunk  (cost=0.42..225.73 rows=55 width=16) (actual time=0.031..0.078 rows=49 loops=1)
         Index Cond: ((32 = location_id) AND (type_id = 1) AND ("timestamp" >= '2002-01-01 00:00:00+01'::timestamp with time zone) AND ("timestamp" < '2003-01-01 00:00:00+01'::timestamp with time zone))
         Buffers: shared hit=12
   ->  Index Scan using _hyper_1_138_chunk_weather_data_location_id_type_id_timestamp_i on _hyper_1_138_chunk  (cost=0.42..801.22 rows=205 width=16) (actual time=0.017..0.181 rows=168 loops=1)
         Index Cond: ((32 = location_id) AND (type_id = 1) AND ("timestamp" >= '2002-01-01 00:00:00+01'::timestamp with time zone) AND ("timestamp" < '2003-01-01 00:00:00+01'::timestamp with time zone))
         Buffers: shared hit=32

使用 bitmapscan 进行生产:

 Sort  (cost=30667.13..30690.56 rows=9371 width=16) (actual time=14.427..15.353 rows=8760 loops=1)
   Sort Key: _hyper_1_137_chunk."timestamp"
   Sort Method: quicksort  Memory: 795kB
   Buffers: shared hit=1672
   ->  Append  (cost=5.26..30048.93 rows=9371 width=16) (actual time=0.044..11.541 rows=8760 loops=1)
         Buffers: shared hit=1672
         ->  Bitmap Heap Scan on _hyper_1_137_chunk  (cost=5.26..202.75 rows=55 width=16) (actual time=0.043..0.089 rows=49 loops=1)
               Recheck Cond: ((32 = location_id) AND (type_id = 1) AND ("timestamp" >= '2002-01-01 00:00:00+01'::timestamp with time zone) AND ("timestamp" < '2003-01-01 00:00:00+01'::timestamp with time zone))
               Heap Blocks: exact=9
               Buffers: shared hit=12
               ->  Bitmap Index Scan on _hyper_1_137_chunk_weather_data_location_id_type_id_timestamp_i  (cost=0.00..5.25 rows=55 width=0) (actual time=0.029..0.029 rows=49 loops=1)
                     Index Cond: ((32 = location_id) AND (type_id = 1) AND ("timestamp" >= '2002-01-01 00:00:00+01'::timestamp with time zone) AND ("timestamp" < '2003-01-01 00:00:00+01'::timestamp with time zone))
                     Buffers: shared hit=3
         ->  Bitmap Heap Scan on _hyper_1_138_chunk  (cost=7.55..642.37 rows=205 width=16) (actual time=0.047..0.189 rows=168 loops=1)
               Recheck Cond: ((32 = location_id) AND (type_id = 1) AND ("timestamp" >= '2002-01-01 00:00:00+01'::timestamp with time zone) AND ("timestamp" < '2003-01-01 00:00:00+01'::timestamp with time zone))
               Heap Blocks: exact=28
               Buffers: shared hit=32
               ->  Bitmap Index Scan on _hyper_1_138_chunk_weather_data_location_id_type_id_timestamp_i  (cost=0.00..7.50 rows=205 width=0) (actual time=0.037..0.037 rows=168 loops=1)
                     Index Cond: ((32 = location_id) AND (type_id = 1) AND ("timestamp" >= '2002-01-01 00:00:00+01'::timestamp with time zone) AND ("timestamp" < '2003-01-01 00:00:00+01'::timestamp with time zone))
                     Buffers: shared hit=4

编辑 2 使用覆盖索引。

create index on weather_data (location_id,timestamp,value);
                           Table « public.weather_data »
   Colonne   |           Type           | Collationnement | NULL-able | Par défaut 
-------------+--------------------------+-----------------+-----------+------------
 timestamp   | timestamp with time zone |                 | not null  | 
 location_id | integer                  |                 | not null  | 
 type_id     | integer                  |                 | not null  | 
 value       | double precision         |                 |           | 
Index :
    "weather_data_pkey" PRIMARY KEY,type_id)
    "weather_data_location_id_type_id_timestamp_value_idx" btree (location_id,value)

索引现在是表格的两倍多。我想我可以使用新索引代替 PK。

SELECT * FROM hypertable_detailed_size('weather_data');
 table_bytes | index_bytes | toast_bytes | total_bytes | node_name 
-------------+-------------+-------------+-------------+-----------
 24146182144 | 53868445696 |           0 | 78014627840 | 

查询似乎并没有更快:

 Sort  (cost=30671.13..30694.56 rows=9371 width=16) (actual time=31.547..32.394 rows=8760 loops=1)
   Sort Key: _hyper_1_137_chunk."timestamp"
   Sort Method: quicksort  Memory: 795kB
   Buffers: shared read=1673
   ->  Append  (cost=5.26..30052.93 rows=9371 width=16) (actual time=0.125..28.312 rows=8760 loops=1)
         Buffers: shared read=1673
         ->  Bitmap Heap Scan on _hyper_1_137_chunk  (cost=5.26..202.75 rows=55 width=16) (actual time=0.123..0.265 rows=49 loops=1)
               Recheck Cond: ((32 = location_id) AND (type_id = 1) AND ("timestamp" >= '2002-01-01 00:00:00+01'::timestamp with time zone) AND ("timestamp" < '2003-01-01 00:00:00+01'::timestamp with time zone))
               Heap Blocks: exact=9
               Buffers: shared read=13
               ->  Bitmap Index Scan on _hyper_1_137_chunk_weather_data_location_id_type_id_timestamp_v  (cost=0.00..5.25 rows=55 width=0) (actual time=0.092..0.092 rows=49 loops=1)
                     Index Cond: ((32 = location_id) AND (type_id = 1) AND ("timestamp" >= '2002-01-01 00:00:00+01'::timestamp with time zone) AND ("timestamp" < '2003-01-01 00:00:00+01'::timestamp with time zone))
                     Buffers: shared read=4
         ->  Bitmap Heap Scan on _hyper_1_138_chunk  (cost=11.55..646.37 rows=205 width=16) (actual time=0.100..0.573 rows=168 loops=1)
               Recheck Cond: ((32 = location_id) AND (type_id = 1) AND ("timestamp" >= '2002-01-01 00:00:00+01'::timestamp with time zone) AND ("timestamp" < '2003-01-01 00:00:00+01'::timestamp with time zone))
               Heap Blocks: exact=28
               Buffers: shared read=32
               ->  Bitmap Index Scan on _hyper_1_138_chunk_weather_data_location_id_type_id_timestamp_v  (cost=0.00..11.50 rows=205 width=0) (actual time=0.079..0.079 rows=168 loops=1)
                     Index Cond: ((32 = location_id) AND (type_id = 1) AND ("timestamp" >= '2002-01-01 00:00:00+01'::timestamp with time zone) AND ("timestamp" < '2003-01-01 00:00:00+01'::timestamp with time zone))
                     Buffers: shared read=4

解决方法

您在 (location_id,type_id,"timestamp") 上的 BTREE 索引走在正确的轨道上。

如果您在其定义中使用 INCLUDE(value),postgreSQL 可以将其视为 covering index,因此可以使用它来完全满足您的查询,而无需返回主表的堆。这应该很有帮助。

此外,不需要它是唯一的。您的主键索引会为您处理唯一性规则。

CREATE INDEX weather_data_location_id_type_id_timestamp_idx 
          ON weather_data 
       USING BTREE
             (location_id,"timestamp")
     INCLUDE (value);

更好的是,将值包含在索引中。

CREATE INDEX weather_data_location_id_type_id_timestamp_idx 
          ON weather_data 
       USING BTREE
             (location_id,"timestamp",value);

那么您的索引将能够满足诸如

之类的汇总查询
 SELECT DATE_TRUNC('month',timestamp) AS month,location_id,MAX(value) as high,MIN(value) as low
   FROM weather_data
  WHERE type_id = 1
    AND location_id = 31
    AND timestamp >= '2000-01-01' 
    AND timestamp < '2020-02-02'
  GROUP BY DATE_TRUNC('month',timestamp),type_id

当然,没有什么魔法可以让处理二十年的天气数据特别快。 postgreSQL 的查询计划器可能会认为通过扫描整个表来满足如此广泛的查询更有效。尝试缩小日期范围。

在玩具大小的表中,简单地将整个表放入 RAM 并扫描它以满足查询可能更有效。我个人从来没有尝试过在玩具大小的桌子上理解执行计划的运气。

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。

相关推荐


Selenium Web驱动程序和Java。元素在(x,y)点处不可单击。其他元素将获得点击?
Python-如何使用点“。” 访问字典成员?
Java 字符串是不可变的。到底是什么意思?
Java中的“ final”关键字如何工作?(我仍然可以修改对象。)
“loop:”在Java代码中。这是什么,为什么要编译?
java.lang.ClassNotFoundException:sun.jdbc.odbc.JdbcOdbcDriver发生异常。为什么?
这是用Java进行XML解析的最佳库。
Java的PriorityQueue的内置迭代器不会以任何特定顺序遍历数据结构。为什么?
如何在Java中聆听按键时移动图像。
Java“Program to an interface”。这是什么意思?