如何解决在Elasticsearch DSL中分析每个帖子的单个标签
我有一个ES实例,其中运行来自travel.stackexchange的数据。
# Example Data
first = ["This was one of our deFinition questions,but also one that interests me personally:
How can I find a guide that will take me safely through the Amazon jungle? I'd love
to explore the Amazon but would not attempt it without a guide,at least not the first
time. I'd prefer a guide that wasn't going to ambush me or anything.I don't want to go
anywhere touristy. Start and end points are open,but the trip should take me places
where I am not likely to see other travelers/tourists and where I will definitely
require a good guide in order to be safe.",# content
'2011-06-21T20:22:33.760',# date of creation
'39',# Votes
'2799',# views
'8',# answers
'4',# comments
'How can I find a guide that will take me safely through the Amazon jungle?',# title
'"guides","extreme-tourism","amazon-river","amazon-jungle"'] # TAGS
我使用
连接到它connections.create_connection(alias='es',hosts=['localhost'],timeout=60)
如您所见,该帖子具有多个标签(“指南”,“ amazon-river”,...)。当我将数据输入ES时,我将标签格式化为字符串。
现在,当我查询索引时(当然会有更大的数据集)
s = Search(using="es",index=current_index)
s.aggs.bucket("per_tag","terms",field="tags",size=5)
r = s.execute()
但是,当我查看结果时,它们看起来像
r.aggregations.per_tag.buckets
>>> [{'key': 'no tags','doc_count': 70672},>>> {'key': '"visas","uk"','doc_count': 330},"schengen"','doc_count': 264},>>> {'key': '"visas"','doc_count': 253},>>> {'key': '"air-travel"','doc_count': 182}]
哪个好,但不是我想要的。如您所见,标记“ visas”被提及了三次,而不是仅仅提及一次。我想要的是一份看起来像这样的回报
>>> [{'key': 'no tags',>>> {'key': 'visas','doc_count': XXX},>>> {'key': 'uk','doc_count': YYY},>>> {'key': 'Schenge','doc_count': ZZZ},>>> {'key': 'air-travel','doc_count': AAA}]
到目前为止,我一直尝试以不同的方式输入标签。一次使用""
,一次不使用,离开,
,仅一次使用spaces
。但是,我觉得我必须更加简洁地定义聚合函数,而不是输入。任何帮助将不胜感激。
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。