我正在使用postgres 9.1,我有一个大约3.5M行eventtype(varchar)和eventtime(timestamp)的表 – 以及其他一些字段.只有大约20种不同的事件类型,事件时间跨度约为4年.
select eventtype,max(eventtime) from allevents group by eventtype
大约需要20秒.选择不同的eventtype同样慢.查询计划显示表的完整顺序扫描 – 毫不奇怪它很慢.
解释分析上面的查询给出:
HashAggregate (cost=84591.47..84591.68 rows=21 width=21) (actual time=20918.131..20918.141 rows=21 loops=1) -> Seq Scan on allevents (cost=0.00..66117.98 rows=3694698 width=21) (actual time=0.021..4831.793 rows=3694392 loops=1) Total runtime: 20918.204 ms
如果我添加一个where子句来选择一个特定的事件类型,它需要40ms到150ms,这至少是不错的.
选择特定事件类型时的查询计划:
GroupAggregate (cost=343.87..24942.71 rows=1 width=21) (actual time=98.397..98.397 rows=1 loops=1) -> Bitmap Heap Scan on allevents (cost=343.87..24871.07 rows=14325 width=21) (actual time=6.820..89.610 rows=19736 loops=1) Recheck Cond: ((eventtype)::text = 'TEST_EVENT'::text) -> Bitmap Index Scan on allevents_idx2 (cost=0.00..340.28 rows=14325 width=0) (actual time=6.121..6.121 rows=19736 loops=1) Index Cond: ((eventtype)::text = 'TEST_EVENT'::text) Total runtime: 98.482 ms
主键是(eventtype,eventtime).我还有以下索引:
allevents_idx (event time desc,eventtype) allevents_idx2 (eventtype).
如何加快查询速度?
下面的@denis建议的相关子查询的查询播放结果有14个手动输入值,给出:
Function Scan on unnest val (cost=0.00..185.40 rows=100 width=32) (actual time=0.121..8983.134 rows=14 loops=1) SubPlan 2 -> Result (cost=1.83..1.84 rows=1 width=0) (actual time=641.644..641.645 rows=1 loops=14) InitPlan 1 (returns $1) -> Limit (cost=0.00..1.83 rows=1 width=8) (actual time=641.640..641.641 rows=1 loops=14) -> Index Scan using allevents_idx on allevents (cost=0.00..322672.36 rows=175938 width=8) (actual time=641.638..641.638 rows=1 loops=14) Index Cond: ((eventtime IS NOT NULL) AND ((eventtype)::text = val.val)) Total runtime: 8983.203 ms
使用@jjanes建议的递归查询,查询运行4到5秒,具有以下计划:
CTE Scan on t (cost=260.32..448.63 rows=101 width=32) (actual time=0.146..4325.598 rows=22 loops=1) CTE t -> Recursive Union (cost=2.52..260.32 rows=101 width=32) (actual time=0.075..1.449 rows=22 loops=1) -> Result (cost=2.52..2.53 rows=1 width=0) (actual time=0.074..0.074 rows=1 loops=1) InitPlan 1 (returns $1) -> Limit (cost=0.00..2.52 rows=1 width=13) (actual time=0.070..0.071 rows=1 loops=1) -> Index Scan using allevents_idx2 on allevents (cost=0.00..9315751.37 rows=3696851 width=13) (actual time=0.070..0.070 rows=1 loops=1) Index Cond: ((eventtype)::text IS NOT NULL) -> WorkTable Scan on t (cost=0.00..25.58 rows=10 width=32) (actual time=0.059..0.060 rows=1 loops=22) Filter: (eventtype IS NOT NULL) SubPlan 3 -> Result (cost=2.53..2.54 rows=1 width=0) (actual time=0.059..0.059 rows=1 loops=21) InitPlan 2 (returns $3) -> Limit (cost=0.00..2.53 rows=1 width=13) (actual time=0.057..0.057 rows=1 loops=21) -> Index Scan using allevents_idx2 on allevents (cost=0.00..3114852.66 rows=1232284 width=13) (actual time=0.055..0.055 rows=1 loops=21) Index Cond: (((eventtype)::text IS NOT NULL) AND ((eventtype)::text > t.eventtype)) SubPlan 6 -> Result (cost=1.83..1.84 rows=1 width=0) (actual time=196.549..196.549 rows=1 loops=22) InitPlan 5 (returns $6) -> Limit (cost=0.00..1.83 rows=1 width=8) (actual time=196.546..196.546 rows=1 loops=22) -> Index Scan using allevents_idx on allevents (cost=0.00..322946.21 rows=176041 width=8) (actual time=196.544..196.544 rows=1 loops=22) Index Cond: ((eventtime IS NOT NULL) AND ((eventtype)::text = t.eventtype)) Total runtime: 4325.694 ms
解决方法
你需要的是“跳过扫描”或“
loose index scan”. Postgresql的规划器还没有自动实现它们,但你可以通过使用递归查询来欺骗它.
WITH RECURSIVE t AS ( SELECT min(eventtype) AS eventtype FROM allevents UNION ALL SELECT (SELECT min(eventtype) as eventtype FROM allevents WHERE eventtype > t.eventtype) FROM t where t.eventtype is not null ) select eventtype,(select max(eventtime) from allevents where eventtype=t.eventtype) from t;
可能有一种方法可以将max(eventtime)折叠到递归查询中,而不是在查询之外进行,但如果是这样的话,我还没有碰到它.
这需要一个索引(eventtype,eventtime)才能有效.您可以在事件发生时将其设为DESC,但这不是必需的.仅当eventtype只有几个不同的值(在您的情况下为21个)时,这才有效.
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。