如何解决带有部分匹配项或子选择项的Clickhouse LEFT JOIN
实际上,CH不支持带有部分匹配(字符串之类)的左连接,所以我试图在表达式列表中使用select子句构建查询,但它不起作用。 也许(对我而言)有一种全新的方式来执行此操作,但是我只是在了解如何执行此操作的线索。
select NumberInTypes,DomainName,Url,(select aa.group_name
from (select t1.id,t1.url_part,ugu.name as group_name
from Url t1
any
left join (select id,urlgroup_id,url_id,ug.name
from UrlGroupUrl t2
any
left join (select id,name
from UrlGroup t3
) ug on t2.urlgroup_id = ug.id
) ugu on t1.id = ugu.url_id) aa where t1.Url like '%' || aa.url_part || '%'
) as UrlGroup,KeywordId,ResultId,HashedContent,SearchEngine,client_name,project_name,group_name,DateParsed
from PositionNew t1
any
left join (
select id as KeywordId,trimBoth(keyword) as keyword,groupid,client_name
from Keyword
any
left join (
select keywordgroup_id as groupid,keyword_id as KeywordId,client_name
from KeywordGroupKeyword
any
left join (
select id as groupid,name as group_name,project_id,client_name
from KeywordGroup
any
left join (
select id as project_id,name as project_name,client_id,client_name
from Project
any
left join (
select id as client_id,name as client_name from Client
) client using client_id
) project using project_id
) kgroup using groupid
) keywordgroup using KeywordId
) keyword using KeywordId
where DateParsed between '2020-07-13' and '2020-08-02'
and PositionType in (1,3)
and client_name like '%ClientName%'
ORDER BY ResultId,NumberInType
LIMIT
1 BY ResultId,DomainName;
更新: 显然,您无法在Clickhouse的相关子查询中使用out查询中的列。因此,我完全没有选择余地,开始认为这是不可能的。
重现该问题的简化示例:
第一个表包含Urls
+------------------------------------+
| Url |
+------------------------------------+
| https://example.com/cat/page1.html |
+------------------------------------+
| https://example.com/cat/page2.html |
+------------------------------------+
| https://example2.com/page.html |
+------------------------------------+
第二个表包含UrlGroups
+-----------------+-----------+
| UrlPart | GroupName |
+-----------------+-----------+
| example.com/cat | DomainCat |
+-----------------+-----------+
| example2.com | Domain2 |
+-----------------+-----------+
我想要实现的是:
+------------------------------------+-----------+
| Url | GroupName |
+------------------------------------+-----------+
| https://example.com/cat/page1.html | DomainCat |
+------------------------------------+-----------+
| https://example.com/cat/page2.html | DomainCat |
+------------------------------------+-----------+
| https://example2.com/page.html | Domain2 |
+------------------------------------+-----------+
所有左联接-不起作用,因为它需要完全匹配 SUBQUERY-无法使用,因为您无法使用外部查询中的列来过滤其结果
解决方法
让我们依靠数组操作:
WITH
(
SELECT (groupArray(UrlPart),groupArray(GroupName))
FROM
(
/* Emulate 'UrlGroups' table. */
SELECT
data.1 AS UrlPart,data.2 AS GroupName
FROM
(
SELECT arrayJoin([
('example.com/cat','DomainCat'),('example2.com','Domain2')]) AS data
)
)
) AS urls_groups
SELECT
Url,arrayElement(
urls_groups.2,multiSearchFirstIndexCaseInsensitiveUTF8(Url,urls_groups.1)) AS GroupName
FROM
(
/* Emulate 'Urls' table. */
SELECT data AS Url
FROM
(
SELECT arrayJoin([
'https://example.com/cat/page1.html','https://example.com/cat/page2.html','https://example2.com/page.html','https://example_unknown.com/page.html']) AS data
)
)
/*
┌─Url───────────────────────────────────┬─GroupName─┐
│ https://example.com/cat/page1.html │ DomainCat │
│ https://example.com/cat/page2.html │ DomainCat │
│ https://example2.com/page.html │ Domain2 │
│ https://example_unknown.com/page.html │ │
└───────────────────────────────────────┴───────────┘
*/
您应该定义要使用的功能-multiSearchFirstIndexCaseInsensitiveUTF8或multiSearchFirstIndexCaseInsensitive。
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。