如何使用 python 构建个性化的 Arxiv 文章提要？

如何解决如何使用 python 构建个性化的 Arxiv 文章提要？

Arxiv 已经发展成为一个非常有用的库。然而，阅读新文章的数量需要相当长的时间，而且数量预计还会增长。因此，我想用 python 构建一个个性化的 Arxiv 提要，其中有一个网站 arXiv API User's Manual。

它有两个主要的搜索语法，search_query 和 id_list，在第 5 节中提到了更多的前缀。即它看起来像这样：

http://export.arxiv.org/api/query?search_query=all:electron&start=0&max_results=10

问题 1：

然而，查询和搜索查询之间的区别真的很令人困惑。因为在 5.1. Details of Query Construction 中 id_list 被混合作为前缀，并且整个网页没有包含如何一起使用 search_query 和 id_list 的示例，即

url="http://export.arxiv.org/api/query?search_query=all:electron+And+id_list=cs/9901002v1"
url="http://export.arxiv.org/api/query?search_query=all:electron+And+query?id_list=cs/9901002v1"

没有返回空但仍然是电子文章。

那是为什么？如何同时使用 id_list 和 search_query？例如，如何限制subject clarification“计算机科学”中包含搜索查询词“电子”的文章？查询有什么区别？和 search_query？

问题 2：

使用 python 3 syntax 的结果也返回了乱码

url="http://export.arxiv.org/api/query?search_query=all:electron"
data=urlopen(url).read();
print(data)

返回（行被拆分以便可以读取）

b'<?xml version="1.0" encoding="UTF-8"?>\n<Feed xmlns="http://www.w3.org/2005/Atom">\n  <link 
href="http://arxiv.org/api/query?search_query%3Dall%3Aelectron%26id_list
%3D%26start%3D0%26max_results%3D10" rel="self" type="application/atom+xml"/>\n  <title type="html">ArXiv Query: search_query=all:electron&amp;id_list=&amp;start=0&amp;max_results=10</title>\n  
<id>http://arxiv.org/api/WyBPOs+pRgzCTXTMWhtnbcOmk6g</id>\n  
<updated>2021-02-27T00:00:00-05:00</updated>\n  <opensearch:totalResults xmlns:opensearch="http://a9.com
/-/spec/opensearch/1.1/">168093</opensearch:totalResults>\n  <opensearch:startIndex 
xmlns:opensearch="http://a9.com/-/spec/opensearch/1.1/">0</opensearch:startIndex>\n  
<opensearch:itemsPerPage xmlns:opensearch="http://a9.com/-/spec/opensearch
/1.1/">10</opensearch:itemsPerPage>\n  <entry>\n    <id>http://arxiv.org/abs/cond-mat/0102536v1</id>\n  
  <updated>2001-02-28T20:12:09Z</updated>\n    <published>2001-02-28T20:12:09Z</published>\n    
<title>Impact of Electron-Electron Cusp on Configuration Interaction Energies</title>\n    <summary> 
 The effect of the electron-electron cusp on the convergence of configuration\ninteraction (CI) wave 
functions is examined. By analogy with the\npseudopotential approach for electron-ion interactions,an 
effective\nelectron-electron interaction is developed which closely reproduces the\nscattering of the Coulomb interaction but is smooth and finite at zero\nelectron-electron separation. The exact many-electron wave function for this\nsmooth effective interaction has no cusp at zero electron-electron separation.\nWe perform CI and quantum Monte Carlo calculations for He and Be atoms,both\nwith the Coulomb electron-electron interaction and with the smooth effective\nelectron-electron interaction. We find that convergence of the CI expansion of\nthe wave function for the smooth electron-electron interaction is not\nsignificantly improved compared with that for the divergent Coulomb interaction\nfor energy differences on the order of 1 mHartree. This shows that,contrary to\npopular belief,description of the electron-electron cusp is not a limiting\nfactor,to within chemical accuracy,for CI 
calculations.\n</summary>\n 

...

首先，为什么返回的数据是字节而不是字符串？字节是否更适合搜索？数据实际上包含什么以及如何使用它们？例如，“b; ... Feed xmlns="http:// www.w3.org/2005/Atom">\n" 与 Arxiv API 有什么关系？ “b'”代表什么？

问题：如何使用 python 构建个性化的 Arxiv 文章提要？即某个主题分类的脚本搜索包含某些作者或关键字，并正确打印出类型字节数据中的“标题”和“摘要”？

我之前对html语言的经验很少，并且是使用python访问带有API的网站的新手。请您帮我填写步骤中的一些基本概念，以便我可以进行更多的谷歌搜索吗？

如何使用 python 构建个性化的 Arxiv 文章提要？

如何解决如何使用 python 构建个性化的 Arxiv 文章提要？

相关推荐