在多个表上使用条件优化查询 表 A表B

如何解决在多个表上使用条件优化查询 表 A表B

我有两个 postgres 表

表 A

id owner_id
1 100
2 101

表B

id a_id user_id
1 1 200
2 1 201
3 2 202
4 2 201
两个表上的

idPKinteger

我在 B-Tree,a.owner_id,b.a_id 上有 b.user_id 索引

第一次查询

在以下查询中

SELECT b.id
FROM b JOIN a ON b.a_id = a.id

WHERE b.user_id = 201
   OR a.owner_id = 100
LIMIT 50;

我有 WHERE b.user_id = 201 OR a.owner_id = 100 条件,查询计划使用了 b.user_id 的索引,但未使用 a.owner_id 的索引,这是查询计划

QUERY PLAN
Limit  (cost=19.54..4445.84 rows=50 width=4) (actual time=0.125..5.031 rows=50 loops=1)
  Buffers: shared hit=1054
  ->  Merge Join  (cost=19.54..9815083.22 rows=110872 width=4) (actual time=0.123..5.018 rows=50 loops=1)
        Merge Cond: (a.id = b.a_id)
        Join Filter: ((b.user_id = 201) OR (a.owner_id = 100))
        Rows Removed by Join Filter: 5547
        Buffers: shared hit=1054
        ->  Index Scan using a_pkey on a  (cost=0.42..103568.63 rows=100009 width=20) (actual time=0.011..0.037 rows=50 loops=1)
              Buffers: shared hit=10
        ->  Index Scan using b_a_id on b  (cost=0.43..9515274.99 rows=11200116 width=24) (actual time=0.009..3.136 rows=5597 loops=1)
              Buffers: shared hit=1044
Planning Time: 0.626 ms
Execution Time: 5.082 ms

查询有点慢,我怎样才能让它更快?

第二次查询

还有另一个较慢的查询

SELECT b.id
FROM b JOIN a ON b.a_id = a.id

WHERE (b.user_id = 201 AND a.owner_id = 100)
   OR (b.user_id = 100 AND a.owner_id = 201)
LIMIT 50;
QUERY PLAN
Limit  (cost=1000.43..19742.38 rows=50 width=4) (actual time=0.705..63.142 rows=50 loops=1)
  Buffers: shared hit=1419 read=3994
  ->  Gather  (cost=1000.43..75593.36 rows=199 width=4) (actual time=0.704..63.124 rows=50 loops=1)
        Workers Planned: 2
        Workers Launched: 2
        Buffers: shared hit=1419 read=3994
        ->  Nested Loop  (cost=0.43..74573.46 rows=83 width=4) (actual time=0.752..13.122 rows=17 loops=3)
              Buffers: shared hit=1419 read=3994
              ->  Parallel Seq Scan on a  (cost=0.00..25628.06 rows=83 width=20) (actual time=0.669..11.868 rows=17 loops=3)
                    Filter: ((owner_id = 100) OR (owner_id = 201))
                    Rows Removed by Filter: 16985
                    Buffers: shared hit=258 read=3994
              ->  Index Scan using b_a_id on b  (cost=0.43..589.69 rows=1 width=24) (actual time=0.023..0.070 rows=1 loops=52)
                    Index Cond: (a_id = a.id)
                    Filter: (((user_id = 201) OR (user_id = 100)) AND (((user_id = 201) AND (a.owner_id = 100)) OR ((a.owner_id = 201) AND (user_id = 100))))
                    Rows Removed by Filter: 105
                    Buffers: shared hit=1161
Planning Time: 0.638 ms
Execution Time: 63.202 ms

解决方法

创建测试数据...

CREATE UNLOGGED TABLE a AS SELECT a_id,(random()*100000)::INTEGER owner_id
FROM generate_series(1,1000000) a_id;
CREATE UNLOGGED TABLE b AS SELECT b_id,(random()*100000)::INTEGER a_id,(random()*100000)::INTEGER user_id
FROM generate_series(1,10000000) b_id;
CREATE INDEX a_o ON a(owner_id);
CREATE INDEX b_a ON b(a_id);
CREATE INDEX b_u ON b(user_id);
ALTER TABLE a ADD PRIMARY KEY(a_id);
ALTER TABLE b ADD PRIMARY KEY(b_id);
VACUUM ANALYZE a,b;

第一次查询的问题是 postgres 不知道如何优化星型连接,所以我们必须给它一点帮助。

WITH ids AS (
  SELECT a_id FROM b WHERE user_id=201
  UNION SELECT a_id FROM a WHERE owner_id=100
)
SELECT * FROM ids JOIN b USING (a_id) LIMIT 50;

这给出了一个使用两个索引的计划,在您的情况下可能会更快,也可能不会。

 Limit  (cost=455.41..634.97 rows=50 width=12) (actual time=0.494..0.642 rows=50 loops=1)
   ->  Nested Loop  (cost=455.41..41596.19 rows=11456 width=12) (actual time=0.492..0.629 rows=50 loops=1)
         ->  HashAggregate  (cost=450.19..451.32 rows=113 width=4) (actual time=0.425..0.427 rows=1 loops=1)
               Group Key: b_1.a_id
               Batches: 1  Memory Usage: 24kB
               ->  Append  (cost=5.23..449.91 rows=113 width=4) (actual time=0.076..0.358 rows=98 loops=1)
                     ->  Bitmap Heap Scan on b b_1  (cost=5.23..401.21 rows=102 width=4) (actual time=0.075..0.299 rows=92 loops=1)
                           Recheck Cond: (user_id = 201)
                           Heap Blocks: exact=92
                           ->  Bitmap Index Scan on b_u  (cost=0.00..5.20 rows=102 width=0) (actual time=0.035..0.035 rows=92 loops=1)
                                 Index Cond: (user_id = 201)
                     ->  Bitmap Heap Scan on a  (cost=4.51..47.00 rows=11 width=4) (actual time=0.019..0.033 rows=6 loops=1)
                           Recheck Cond: (owner_id = 100)
                           Heap Blocks: exact=6
                           ->  Bitmap Index Scan on a_o  (cost=0.00..4.51 rows=11 width=0) (actual time=0.014..0.014 rows=6 loops=1)
                                 Index Cond: (owner_id = 100)
         ->  Bitmap Heap Scan on b  (cost=5.22..363.09 rows=101 width=12) (actual time=0.059..0.174 rows=50 loops=1)
               Recheck Cond: (a_id = b_1.a_id)
               Heap Blocks: exact=50
               ->  Bitmap Index Scan on b_a  (cost=0.00..5.19 rows=101 width=0) (actual time=0.023..0.023 rows=104 loops=1)
                     Index Cond: (a_id = b_1.a_id)
 Planning Time: 0.448 ms
 Execution Time: 0.747 ms

至于其他查询,我必须运行这个:

select owner_id,user_id,count(*) from a join b using (a_id) group by owner_id,user_id order by count(*) desc limit 100;

获取一些 user_id,owner_id 来实际从我的测试数据中返回结果。那么,

EXPLAIN ANALYZE
SELECT b.*
FROM b JOIN a USING (a_id)
WHERE (b.user_id = 99238 AND a.owner_id = 58599)
   OR (b.user_id = 36859 AND a.owner_id = 99027)
LIMIT 50;

Limit  (cost=24.97..532.32 rows=1 width=12) (actual time=0.274..0.982 rows=6 loops=1)
   ->  Nested Loop  (cost=24.97..532.32 rows=1 width=12) (actual time=0.271..0.976 rows=6 loops=1)
         ->  Bitmap Heap Scan on a  (cost=9.03..92.70 rows=22 width=8) (actual time=0.108..0.216 rows=12 loops=1)
               Recheck Cond: ((owner_id = 58599) OR (owner_id = 99027))
               Heap Blocks: exact=12
               ->  BitmapOr  (cost=9.03..9.03 rows=22 width=0) (actual time=0.086..0.088 rows=0 loops=1)
                     ->  Bitmap Index Scan on a_o  (cost=0.00..4.51 rows=11 width=0) (actual time=0.064..0.065 rows=3 loops=1)
                           Index Cond: (owner_id = 58599)
                     ->  Bitmap Index Scan on a_o  (cost=0.00..4.51 rows=11 width=0) (actual time=0.020..0.020 rows=9 loops=1)
                           Index Cond: (owner_id = 99027)
         ->  Bitmap Heap Scan on b  (cost=15.95..19.97 rows=1 width=12) (actual time=0.058..0.060 rows=0 loops=12)
               Recheck Cond: ((a_id = a.a_id) AND ((user_id = 99238) OR (user_id = 36859)))
               Filter: (((user_id = 99238) AND (a.owner_id = 58599)) OR ((user_id = 36859) AND (a.owner_id = 99027)))
               Heap Blocks: exact=6
               ->  BitmapAnd  (cost=15.95..15.95 rows=1 width=0) (actual time=0.053..0.053 rows=0 loops=12)
                     ->  Bitmap Index Scan on b_a  (cost=0.00..5.19 rows=101 width=0) (actual time=0.015..0.015 rows=50 loops=12)
                           Index Cond: (a_id = a.a_id)
                     ->  BitmapOr  (cost=10.50..10.50 rows=205 width=0) (actual time=0.046..0.046 rows=0 loops=6)
                           ->  Bitmap Index Scan on b_u  (cost=0.00..5.20 rows=102 width=0) (actual time=0.021..0.021 rows=121 loops=6)
                                 Index Cond: (user_id = 99238)
                           ->  Bitmap Index Scan on b_u  (cost=0.00..5.20 rows=102 width=0) (actual time=0.024..0.024 rows=105 loops=6)
                                 Index Cond: (user_id = 36859)
 Planning Time: 0.703 ms
 Execution Time: 1.063 ms

它不像你的那样使用 seq 扫描,所以也许你的旧版本无法正确优化它?当行数估计非常准确时,它会为表 a 选择 seq 扫描,这很奇怪。你应该调查一下,也许试试

SELECT * FROM a WHERE a.owner_id = 58599 OR a.owner_id = 99027
LIMIT 50;

这应该给出一个索引或位图索引扫描,如果它做一个 seq 扫描,那么你有一个小的测试用例来找出原因。无论如何,您仍然可以通过以下方式强制使用索引:

EXPLAIN ANALYZE
WITH ids AS (
  SELECT a_id FROM b WHERE user_id IN (99238,36859)
  UNION SELECT a_id FROM a WHERE owner_id IN (58599,99027)
)
SELECT * FROM ids JOIN b USING (a_id) JOIN a USING (a_id)
    WHERE (b.user_id = 99238 AND a.owner_id = 58599)
       OR (b.user_id = 36859 AND a.owner_id = 99027);

...但它非常丑陋。或者你可以分别在你的 OR 中执行每个子句,然后用这个多次执行 AND,这也很丑陋:

EXPLAIN ANALYZE
SELECT a_id FROM b WHERE b.user_id = 99238 
INTERSECT
SELECT a_id FROM a WHERE a.owner_id = 58599
LIMIT 50;

如何优化大的偏移量

你不会,事实上,当使用大偏移量时,它通常暗示你做错了,通过重复执行相同的查询,例如分页,并显示结果块。有两种解决方案。如果获取结果的速度足够快,以便在您执行此操作时事务可以保持打开状态,请为查询打开一个没有 LIMIT 或 OFFSET 的游标,并使用 FETCH 以块的形式获取结果。否则,在没有 LIMIT 的情况下执行一次查询,将结果存储在缓存中,然后在不重新执行查询的情况下对其进行分页。

,

使用 =FILTER(IF(tbl="";"";tbl);tbl[City]="city1") 而不是 UNION

OR

SELECT * FROM ((SELECT b.id FROM b JOIN a ON b.a_id = a.id WHERE b.user_id = 201 LIMIT 50) UNION (SELECT b.id FROM b JOIN a ON b.a_id = a.id WHERE a.owner_id = 100 LIMIT 50)) AS q LIMIT 50; a(owner_id)a(id)b(user_id) 上的索引将使其更快。

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。

相关推荐


使用本地python环境可以成功执行 import pandas as pd import matplotlib.pyplot as plt # 设置字体 plt.rcParams['font.sans-serif'] = ['SimHei'] # 能正确显示负号 p
错误1:Request method ‘DELETE‘ not supported 错误还原:controller层有一个接口,访问该接口时报错:Request method ‘DELETE‘ not supported 错误原因:没有接收到前端传入的参数,修改为如下 参考 错误2:cannot r
错误1:启动docker镜像时报错:Error response from daemon: driver failed programming external connectivity on endpoint quirky_allen 解决方法:重启docker -> systemctl r
错误1:private field ‘xxx‘ is never assigned 按Altʾnter快捷键,选择第2项 参考:https://blog.csdn.net/shi_hong_fei_hei/article/details/88814070 错误2:启动时报错,不能找到主启动类 #
报错如下,通过源不能下载,最后警告pip需升级版本 Requirement already satisfied: pip in c:\users\ychen\appdata\local\programs\python\python310\lib\site-packages (22.0.4) Coll
错误1:maven打包报错 错误还原:使用maven打包项目时报错如下 [ERROR] Failed to execute goal org.apache.maven.plugins:maven-resources-plugin:3.2.0:resources (default-resources)
错误1:服务调用时报错 服务消费者模块assess通过openFeign调用服务提供者模块hires 如下为服务提供者模块hires的控制层接口 @RestController @RequestMapping("/hires") public class FeignControl
错误1:运行项目后报如下错误 解决方案 报错2:Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.8.1:compile (default-compile) on project sb 解决方案:在pom.
参考 错误原因 过滤器或拦截器在生效时,redisTemplate还没有注入 解决方案:在注入容器时就生效 @Component //项目运行时就注入Spring容器 public class RedisBean { @Resource private RedisTemplate<String
使用vite构建项目报错 C:\Users\ychen\work>npm init @vitejs/app @vitejs/create-app is deprecated, use npm init vite instead C:\Users\ychen\AppData\Local\npm-
参考1 参考2 解决方案 # 点击安装源 协议选择 http:// 路径填写 mirrors.aliyun.com/centos/8.3.2011/BaseOS/x86_64/os URL类型 软件库URL 其他路径 # 版本 7 mirrors.aliyun.com/centos/7/os/x86
报错1 [root@slave1 data_mocker]# kafka-console-consumer.sh --bootstrap-server slave1:9092 --topic topic_db [2023-12-19 18:31:12,770] WARN [Consumer clie
错误1 # 重写数据 hive (edu)> insert overwrite table dwd_trade_cart_add_inc > select data.id, > data.user_id, > data.course_id, > date_format(
错误1 hive (edu)> insert into huanhuan values(1,'haoge'); Query ID = root_20240110071417_fe1517ad-3607-41f4-bdcf-d00b98ac443e Total jobs = 1
报错1:执行到如下就不执行了,没有显示Successfully registered new MBean. [root@slave1 bin]# /usr/local/software/flume-1.9.0/bin/flume-ng agent -n a1 -c /usr/local/softwa
虚拟及没有启动任何服务器查看jps会显示jps,如果没有显示任何东西 [root@slave2 ~]# jps 9647 Jps 解决方案 # 进入/tmp查看 [root@slave1 dfs]# cd /tmp [root@slave1 tmp]# ll 总用量 48 drwxr-xr-x. 2
报错1 hive> show databases; OK Failed with exception java.io.IOException:java.lang.RuntimeException: Error in configuring object Time taken: 0.474 se
报错1 [root@localhost ~]# vim -bash: vim: 未找到命令 安装vim yum -y install vim* # 查看是否安装成功 [root@hadoop01 hadoop]# rpm -qa |grep vim vim-X11-7.4.629-8.el7_9.x
修改hadoop配置 vi /usr/local/software/hadoop-2.9.2/etc/hadoop/yarn-site.xml # 添加如下 <configuration> <property> <name>yarn.nodemanager.res