查询是:
SELECT f.departure,f.arrival,p.callsign,p.flightkey,p.time,p.lat,p.lon,p.altitude_ft,p.speed FROM position_2012_09_12 AS p JOIN flight_2012_09_12 AS f ON p.flightkey = f.flightkey WHERE p.lon < 0 AND p.time BETWEEN '2012-9-12 0:0:0' AND '2012-9-12 23:0:0'
解释分析的输出是:
Hash Join (cost=239891.03..470396.82 rows=4790498 width=51) (actual time=29203.830..45777.193 rows=4403717 loops=1) Hash Cond: (f.flightkey = p.flightkey) -> Seq Scan on flight_2012_09_12 f (cost=0.00..1934.31 rows=70631 width=12) (actual time=0.014..220.494 rows=70631 loops=1) -> Hash (cost=158415.97..158415.97 rows=3916885 width=43) (actual time=29201.012..29201.012 rows=3950815 loops=1) Buckets: 2048 Batches: 512 (originally 256) Memory Usage: 1025kB -> Seq Scan on position_2012_09_12 p (cost=0.00..158415.97 rows=3916885 width=43) (actual time=0.006..14630.058 rows=3950815 loops=1) Filter: ((lon < 0::double precision) AND ("time" >= '2012-09-12 00:00:00'::timestamp without time zone) AND ("time" <= '2012-09-12 23:00:00'::timestamp without time zone)) Total runtime: 58522.767 ms
我认为问题在于位置表上的顺序扫描,但我无法弄清楚它为什么存在.带索引的表结构如下:
Table "public.flight_2012_09_12" Column | Type | Modifiers --------------------+-----------------------------+----------- callsign | character varying(8) | flightkey | integer | source | character varying(16) | departure | character varying(4) | arrival | character varying(4) | original_etd | timestamp without time zone | original_eta | timestamp without time zone | enroute | boolean | etd | timestamp without time zone | eta | timestamp without time zone | equipment | character varying(6) | diverted | timestamp without time zone | time | timestamp without time zone | lat | double precision | lon | double precision | altitude | character varying(7) | altitude_ft | integer | speed | character varying(4) | asdi_acid | character varying(4) | enroute_eta | timestamp without time zone | enroute_eta_source | character varying(1) | Indexes: "flight_2012_09_12_flightkey_idx" btree (flightkey) "idx_2012_09_12_altitude_ft" btree (altitude_ft) "idx_2012_09_12_arrival" btree (arrival) "idx_2012_09_12_callsign" btree (callsign) "idx_2012_09_12_departure" btree (departure) "idx_2012_09_12_diverted" btree (diverted) "idx_2012_09_12_enroute_eta" btree (enroute_eta) "idx_2012_09_12_equipment" btree (equipment) "idx_2012_09_12_etd" btree (etd) "idx_2012_09_12_lat" btree (lat) "idx_2012_09_12_lon" btree (lon) "idx_2012_09_12_original_eta" btree (original_eta) "idx_2012_09_12_original_etd" btree (original_etd) "idx_2012_09_12_speed" btree (speed) "idx_2012_09_12_time" btree ("time") Table "public.position_2012_09_12" Column | Type | Modifiers -------------+-----------------------------+----------- callsign | character varying(8) | flightkey | integer | time | timestamp without time zone | lat | double precision | lon | double precision | altitude | character varying(7) | altitude_ft | integer | course | integer | speed | character varying(4) | trackerkey | integer | the_geom | geometry | Indexes: "index_2012_09_12_altitude_ft" btree (altitude_ft) "index_2012_09_12_callsign" btree (callsign) "index_2012_09_12_course" btree (course) "index_2012_09_12_flightkey" btree (flightkey) "index_2012_09_12_speed" btree (speed) "index_2012_09_12_time" btree ("time") "position_2012_09_12_flightkey_idx" btree (flightkey) "test_index" btree (lon) "test_index_lat" btree (lat)
我想不出任何其他方式来重写查询,所以我很难过.如果当前设置尽可能好,那么它在我看来它应该比现在快得多.任何帮助将非常感激.
你的位置表每行使用71个字节,加上geom类型所需的(我假设16个字节用于说明),产生87个字节. Postgres页面是8192个字节.所以每页大约有90行.
您的查询与5563070行中的3950815匹配,或约占总数的70%.假设数据是随机分布的,关于你的where过滤器,找到没有匹配行的数据页面几乎有30%^ 90的可能性.这基本上没什么.因此,无论索引有多好,您仍然必须阅读所有数据页.如果您还要阅读所有页面,表扫描通常是一种很好的方法.
一个人离开这里,是我说的非覆盖指数.如果您准备创建可以自己回答查询的索引,则可以避免查找数据页,这样您就可以重新进入游戏.我建议以下值得关注:
flight_2012_09_12 (flightkey,departure,arrival) position_2012_09_12 (filghtkey,time,lon,...) position_2012_09_12 (lon,flightkey,...) position_2012_09_12 (time,long,...)
这里的点代表您选择的其余列.你只需要一个位置上的指数,但很难说哪个指数最好.第一种方法可以允许对预先排序的数据进行合并连接,其中读取整个第二索引的成本进行过滤.第二个和第三个将允许预过滤数据,但需要散列连接.给出在散列连接中看起来有多少成本,合并连接可能是一个不错的选择.
由于您的查询需要每行87个字节中的52个,并且索引具有开销,因此您可能无法获得索引占用的空间(如果有的话),而不是表本身.
原文地址:https://www.jb51.cc/postgresql/191926.html
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。