在Elasticsearch DSL中分析每个帖子的单个标签

如何解决在Elasticsearch DSL中分析每个帖子的单个标签

我有一个ES实例，其中运行来自travel.stackexchange的数据。

# Example Data
first = ["This was one of our deFinition questions,but also one that interests me personally:
          How can I find a guide that will take me safely through the Amazon jungle? I'd love
          to explore the Amazon but would not attempt it without a guide,at least not the first
          time. I'd prefer a guide that wasn't going to ambush me or anything.I don't want to go
          anywhere touristy.  Start and end points are open,but the trip should take me places
          where I am not likely to see other travelers/tourists and where I will definitely
          require a good guide in order to be safe.",# content
          '2011-06-21T20:22:33.760',# date of creation
          '39',# Votes
          '2799',# views
          '8',# answers
          '4',# comments
          'How can I find a guide that will take me safely through the Amazon jungle?',# title
          '"guides","extreme-tourism","amazon-river","amazon-jungle"'] # TAGS

我使用

连接到它

connections.create_connection(alias='es',hosts=['localhost'],timeout=60)

如您所见，该帖子具有多个标签（“指南”，“ amazon-river”，...）。当我将数据输入ES时，我将标签格式化为字符串。

现在，当我查询索引时（当然会有更大的数据集）

s = Search(using="es",index=current_index)

并汇总每个标签被提及的次数。

s.aggs.bucket("per_tag","terms",field="tags",size=5)
r = s.execute()

但是，当我查看结果时，它们看起来像

r.aggregations.per_tag.buckets
>>> [{'key': 'no tags','doc_count': 70672},>>>  {'key': '"visas","uk"','doc_count': 330},"schengen"','doc_count': 264},>>>  {'key': '"visas"','doc_count': 253},>>>  {'key': '"air-travel"','doc_count': 182}]

哪个好，但不是我想要的。如您所见，标记“ visas”被提及了三次，而不是仅仅提及一次。我想要的是一份看起来像这样的回报

>>> [{'key': 'no tags',>>>  {'key': 'visas','doc_count': XXX},>>>  {'key': 'uk','doc_count': YYY},>>>  {'key': 'Schenge','doc_count': ZZZ},>>>  {'key': 'air-travel','doc_count': AAA}]

到目前为止，我一直尝试以不同的方式输入标签。一次使用""，一次不使用，离开,，仅一次使用spaces。但是，我觉得我必须更加简洁地定义聚合函数，而不是输入。任何帮助将不胜感激。

在Elasticsearch DSL中分析每个帖子的单个标签

如何解决在Elasticsearch DSL中分析每个帖子的单个标签

相关推荐