微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

PostgresSQL 中的正则表达式连接查询优化

如何解决PostgresSQL 中的正则表达式连接查询优化

如何优化下面的查询,它took 8 hrs to run :

create table rtime.rtime_calc1_jun13tojun19 as(
    explain select pa.api as pa_api,pa.action_type as pa_action_type,max(rt.request_time),avg(rt.request_time),percentile_cont(0.95) within group (order by rt.request_time asc) as percentile_95
    from  public.public_api pa,(select reqtime.* from public.public_api puba 
           right join rtime.rtime_data1_jun13tojun19 reqtime
           on puba.api = reqtime.proxy
           where puba.api is null) as rt   -- to join only regex patterns,and to prevent exact static matches from becoming a part of regex join
    where rt.proxy ~* pa.api_regex
    and   rt.method = pa.action_type
    group by pa.api,pa.action_type
)

下面是解释计划:

GroupAggregate  (cost=1131.43..263846.61 rows=1 width=70)
  Group Key: pa.api,pa.action_type
  ->  nested Loop  (cost=1131.43..263846.59 rows=1 width=54)
        Join Filter: (((reqtime.proxy)::text ~* (pa.api_regex)::text) AND ((reqtime.method)::text = (pa.action_type)::text))
        ->  Index Scan using primary_key_pa on public_api pa  (cost=0.28..565.81 rows=2007 width=90)
        ->  Materialize  (cost=1131.16..263245.66 rows=1 width=49)
              ->  Gather  (cost=1131.16..263245.65 rows=1 width=49)
                    Workers Planned: 2
                    ->  Hash Anti Join  (cost=131.16..262245.55 rows=1 width=49)
                          Hash Cond: ((reqtime.proxy)::text = (puba.api)::text)
                          ->  Parallel Seq Scan on rtime_data1_jun13tojun19 reqtime  (cost=0.00..218885.08 rows=5763908 width=49)
                          ->  Hash  (cost=106.07..106.07 rows=2007 width=42)
                                ->  Seq Scan on public_api puba  (cost=0.00..106.07 rows=2007 width=42)

public.public_api 表有 2007 行。
rtime.rtime_data1_jun13tojun19 表中有 13837305 rows

这是 public_api 表的 DDL:

CREATE TABLE public.public_api (
    api varchar NOT NULL,"type" varchar NULL,api_bin varchar NULL,api_bin_avg_resp_time varchar NULL,api_bin_perc95_resp_time varchar NULL,max_response_time float8 NULL,avg_response_time float8 NULL,percentile_95_response_time float8 NULL,max_tps int4 NULL,min_tps int4 NULL,avg_tps float8 NULL,percentile_90_tps float8 NULL,percentile_99_tps float8 NULL,percentile_95_tps float8 NULL,product varchar NULL,action_type varchar NOT NULL,proxy varchar NULL,CONSTRAINT primary_key_pa PRIMARY KEY (api,action_type)
);

这是 rtime.rtime_data1_jun13tojun19 的 DDL

CREATE TABLE rtime.rtime_data1_jun13tojun19 (
    env varchar NULL,"method" varchar NULL,request_time float8 NULL
);

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。