微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

带有部分匹配项或子选择项的Clickhouse LEFT JOIN

如何解决带有部分匹配项或子选择项的Clickhouse LEFT JOIN

实际上,CH不支持带有部分匹配(字符串之类)的左连接,所以我试图在表达式列表中使用select子句构建查询,但它不起作用。 也许(对我而言)有一种全新的方式来执行此操作,但是我只是在了解如何执行此操作的线索。

错误是“在处理查询时缺少列:'DomainName'”

select NumberInTypes,DomainName,Url,(select aa.group_name
        from (select t1.id,t1.url_part,ugu.name as group_name
              from Url t1
                       any
                       left join (select id,urlgroup_id,url_id,ug.name
                                  from UrlGroupUrl t2
                                           any
                                           left join (select id,name
                                                      from UrlGroup t3
                                      ) ug on t2.urlgroup_id = ug.id
                  ) ugu on t1.id = ugu.url_id) aa where t1.Url like '%' || aa.url_part || '%'
        ) as UrlGroup,KeywordId,ResultId,HashedContent,SearchEngine,client_name,project_name,group_name,DateParsed
from PositionNew t1
         any
         left join (
    select id as KeywordId,trimBoth(keyword) as keyword,groupid,client_name
    from Keyword
             any
             left join (
        select keywordgroup_id as groupid,keyword_id as KeywordId,client_name
        from KeywordGroupKeyword
                 any
                 left join (
            select id as groupid,name as group_name,project_id,client_name
            from KeywordGroup
                     any
                     left join (
                select id as project_id,name as project_name,client_id,client_name
                from Project
                         any
                         left join (
                    select id as client_id,name as client_name from Client
                    ) client using client_id
                ) project using project_id
            ) kgroup using groupid
        ) keywordgroup using KeywordId
    ) keyword using KeywordId
where DateParsed between '2020-07-13' and '2020-08-02'
  and PositionType in (1,3)
  and client_name like '%ClientName%'
ORDER BY ResultId,NumberInType
LIMIT
    1 BY ResultId,DomainName;

更新: 显然,您无法在Clickhouse的相关子查询中使用out查询中的列。因此,我完全没有选择余地,开始认为这是不可能的。

重现该问题的简化示例:

一个表包含Urls

+------------------------------------+
| Url                                |
+------------------------------------+
| https://example.com/cat/page1.html |
+------------------------------------+
| https://example.com/cat/page2.html |
+------------------------------------+
| https://example2.com/page.html     |
+------------------------------------+

第二个表包含UrlGroups

+-----------------+-----------+
| UrlPart         | GroupName |
+-----------------+-----------+
| example.com/cat | DomainCat |
+-----------------+-----------+
| example2.com    | Domain2   |
+-----------------+-----------+

我想要实现的是:

+------------------------------------+-----------+
| Url                                | GroupName |
+------------------------------------+-----------+
| https://example.com/cat/page1.html | DomainCat |
+------------------------------------+-----------+
| https://example.com/cat/page2.html | DomainCat |
+------------------------------------+-----------+
| https://example2.com/page.html     | Domain2   |
+------------------------------------+-----------+

所有左联接-不起作用,因为它需要完全匹配 SUBQUERY-无法使用,因为您无法使用外部查询中的列来过滤其结果

解决方法

让我们依靠数组操作:

WITH 
    (
        SELECT (groupArray(UrlPart),groupArray(GroupName))
        FROM 
        (
            /* Emulate 'UrlGroups' table. */
            SELECT 
                data.1 AS UrlPart,data.2 AS GroupName
            FROM 
            (
                SELECT arrayJoin([
                  ('example.com/cat','DomainCat'),('example2.com','Domain2')]) AS data
            )
        )
    ) AS urls_groups
SELECT 
    Url,arrayElement(
      urls_groups.2,multiSearchFirstIndexCaseInsensitiveUTF8(Url,urls_groups.1)) AS GroupName
FROM 
(
    /* Emulate 'Urls' table. */
    SELECT data AS Url
    FROM 
    (        
        SELECT arrayJoin([
          'https://example.com/cat/page1.html','https://example.com/cat/page2.html','https://example2.com/page.html','https://example_unknown.com/page.html']) AS data          
    )
)

/*
┌─Url───────────────────────────────────┬─GroupName─┐
│ https://example.com/cat/page1.html    │ DomainCat │
│ https://example.com/cat/page2.html    │ DomainCat │
│ https://example2.com/page.html        │ Domain2   │
│ https://example_unknown.com/page.html │           │
└───────────────────────────────────────┴───────────┘
*/

您应该定义要使用的功能-multiSearchFirstIndexCaseInsensitiveUTF8multiSearchFirstIndexCaseInsensitive

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。