手把手教你用Python读懂全球最大百科全书！基维百科！

没人否认，维基百科是现代最令人惊叹的人类发明之一。

几年前谁能想到，匿名贡献者们的义务工作竟创造出前所未有的巨大在线知识库？维基百科不仅是你写大学论文时最好的信息渠道，也是一个极其丰富的数据源。

从自然语言处理到监督式机器学习，维基百科助力了无数的数据科学项目。

维基百科的规模之大，可称为世上最大的百科全书，但也因此稍让数据工程师们感到头疼。当然，有合适的工具的话，数据量的规模就不是那么大的问题了。

本文将介绍“如何编程下载和解析英文版维基百科”。

在介绍过程中，我们也会提及以下几个数据科学中重要的问题：

1、从网络中搜索和编程下载数据

2、运用Python库解析网络数据（HTML,XML,Mediawiki格式）

3、多进程处理、并行化处理

这个项目最初是想要收集维基百科上所有的书籍信息，但我之后发现项目中使用的解决方法可以有更广泛的应用。这里提到的，以及在Jupyter Notebook里展示的技术，能够高效处理维基百科上的所有文章，同时还能扩展到其它的网络数据源中。

本文中运用的Python代码的笔记放在GitHub，灵感来源于Douwe Osinga超棒的《深度学习手册》。前面提到的Jupyter Notebooks也可以免费获取。

GitHub链接： https://github.com/WillKoehrsen/wikipedia-data-science/blob/master/notebooks/Downloading%20and%20Parsing%20Wikipedia%20Articles.ipynb

免费获取地址：

进群：548377875 即可获取数十套PDF哦！源码也可以私信！

https://github.com/DOsinga/deep_learning_cookbook

编程搜索和下载数据

任何一个数据科学项目第一步都是获取数据。我们当然可以一个个进入维基百科页面打包下载搜索结果，但很快就会下载受限，而且还会给维基百科的服务器造成压力。还有一种办法，我们通过dumps.wikimedia.org这个网站获取维基百科所有数据的定期快照结果，又称dump。

用下面这段代码，我们可以看到数据库的可用版本：

import requests
# Library for parsing HTML
from bs4 import BeautifulSoup
base_url = 'https://dumps.wikimedia.org/enwiki/'
index = requests.get(base_url).text
soup_index = BeautifulSoup(index,'html.parser')
# Find the links on the page
dumps = [a['href'] for a in soup_index.find_all('a') if 
 a.has_attr('href')]
dumps
['../','20180620/','20180701/','20180720/','20180801/','20180820/','20180901/','20180920/','latest/']

这段代码使用了BeautifulSoup库来解析HTML。由于HTML是网页的标准标识语言，因此就处理网络数据来说，这个库简直是无价瑰宝。

本项目使用的是2018年9月1日的dump（有些dump数据不全，请确保选择一个你所需的数据）。我们使用下列代码来找到dump里所有的文件。

dump_url = base_url + '20180901/'
# Retrieve the html
dump_html = requests.get(dump_url).text
# Convert to a soup
soup_dump = BeautifulSoup(dump_html,'html.parser')
# Find list elements with the class file
soup_dump.find_all('li',{'class': 'file'})[:3]
[ 15.2 GB
, 195.6 MB
,Meta-history1.xml-p10p2101.7z">enwiki-20180901-pages-Meta-history1.xml-p10p2101.7z 320.6 MB]

我们再一次使用BeautifulSoup来解析网络找寻文件。我们可以在https://dumps.wikimedia.org/enwiki/20180901/页面里手工下载文件，但这就不够效率了。网络数据如此庞杂，懂得如何解析HTML和在程序中与网页交互是非常有用的——学点网站检索知识，庞大的新数据源便触手可及。

考虑好下载什么

上述代码把dump里的所有文件都找出来了，你也就有了一些下载的选择：文章当前版本，文章页以及当前讨论列表，或者是文章所有历史修改版本和讨论列表。如果你选择最后一个，那就是万亿字节的数据量了！本项目只选用文章最新版本。

所有文章的当前版本能以单个文档的形式获得，但如果我们下载解析这个文档，就得非常费劲地一篇篇文章翻看，非常低效。更好的办法是，下载多个分区文档，每个文档内容是文章的一个章节。之后，我们可以通过并行化一次解析多个文档，显著提高效率。

“当我处理文档时，我更喜欢多个小文档而非一个大文档，这样我就可以并行化运行多个文档了。”

分区文档格式为bz2压缩的XML（可扩展标识语言），每个分区大小300～400MB，全部的压缩包大小15.4GB。无需解压，但如果你想解压，大小约58GB。这个大小对于人类的全部知识来说似乎并不太大。

手把手教你用Python读懂全球最大百科全书！基维百科！

维基百科压缩文件大小下载文件

Keras 中的get_file语句在实际下载文件中非常好用。下面的代码可通过链接下载文件并保存到磁盘中：

from keras.utils import get_file
saved_file_path = get_file(file,url)

下载的文件保存在~/.keras/datasets/，也是Keras默认保存设置。一次性下载全部文件需2个多小时（你可以试试并行下载，但我试图同时进行多个下载任务时被限速了）

解析数据

我们首先得解压文件。但实际我们发现，想获取全部文章数据根本不需要这样。我们可以通过一次解压运行一行内容来迭代文档。当内存不够运行大容量数据时，在文件间迭代通常是唯一选择。我们可以使用bz2库对bz2压缩的文件迭代。

不过在测试过程中，我发现了一个更快捷（双倍快捷）的方法，用的是system utility bzcat以及python模块的subprocess。以上揭示了一个重要的观点：解决问题往往有很多种办法，而找到最有效办法的唯一方式就是对我们的方案进行基准测试。这可以很简单地通过%%timeit Jupyter cell magic来对方案计时评价。

迭代解压文件的基本格式为：

data_path = '~/.keras/datasets/enwiki-20180901-pages-articles15.xml-p7744803p9244803.bz2
# Iterate through compressed file one line at a time
for line in subprocess.Popen(['bzcat'],stdin = open(data_path),stdout = subprocess.PIPE).stdout:
 # process line

如果简单地读取XML数据，并附为一个列表，我们得到看起来像这样的东西：

手把手教你用Python读懂全球最大百科全书！基维百科！

维基百科文章的源XML

上面展示了一篇维基百科文章的XML文件。每个文件里面有成千上万篇文章，因此我们下载的文件里包含百万行这样的语句。如果我们真想把事情弄复杂，我们可以用正则表达式和字符串匹配跑一遍文档来找到每篇文章。这就极其低效了，我们可以采取一个更好的办法：使用解析XML和维基百科式文章的定制化工具。

解析方法

我们需要在两个层面上来解析文档：

1、从XML中提取文章标题和内容

2、从文章内容中提取相关信息

好在，Python对这两个都有不错的应对方法。

解析XML

解决第一个问题——定位文章，我们使用SAX（Simple API forXML）语法解析器。BeautifulSoup语句也可以用来解析XML，但需要内存载入整个文档并且建立一个文档对象模型（DOM）。而SAX一次只运行XML里的一行字，完美符合我们的应用场景。

基本思路就是我们对XML文档进行搜索，在特定标签间提取相关信息。例如，给出下面这段XML语句：

Carroll F. Knicely
nor of Kentucky|Kentucky Governors]] as commissioner and later Commerce Secretary.
'

我们想筛出在和<text>这两<a href="https://www.jb51.cc/tag/biaoqian/" target="_blank" class="keywords">标签</a><a href="https://www.jb51.cc/tag/jiande/" target="_blank" class="keywords">间的</a><a href="https://www.jb51.cc/tag/neirong/" target="_blank" class="keywords">内容</a>（这个title就是维基百科<a href="https://www.jb51.cc/tag/wenzhang/" target="_blank" class="keywords">文章</a><a href="https://www.jb51.cc/tag/biaoti/" target="_blank" class="keywords">标题</a>，text就是<a href="https://www.jb51.cc/tag/wenzhangneirong/" target="_blank" class="keywords">文章内容</a>）。SAX能直接让我们实现这样的<a href="https://www.jb51.cc/tag/gongneng/" target="_blank" class="keywords">功能</a>——通过parser和ContentHandler这两个语句来控制信息如何通过解析器然后被处理。每次扫一行XML句子进解析器，Content Handler则帮我们<a href="https://www.jb51.cc/tag/tiqu/" target="_blank" class="keywords">提取</a>相关的信息。</p> <p>如果你不尝试做一下，可能理解起来有点难度，但是Content handler的思想是寻找开始<a href="https://www.jb51.cc/tag/biaoqian/" target="_blank" class="keywords">标签</a>和结束<a href="https://www.jb51.cc/tag/biaoqian/" target="_blank" class="keywords">标签</a>之<a href="https://www.jb51.cc/tag/jiande/" target="_blank" class="keywords">间的</a><a href="https://www.jb51.cc/tag/neirong/" target="_blank" class="keywords">内容</a>，将找到的字符<a href="https://www.jb51.cc/tag/tianjia/" target="_blank" class="keywords">添加</a>到缓存中。然后将缓存的<a href="https://www.jb51.cc/tag/neirong/" target="_blank" class="keywords">内容</a>保存到字典中，其中相应的<a href="https://www.jb51.cc/tag/biaoqian/" target="_blank" class="keywords">标签</a>作为对应的键。最后我们得到<a href="https://www.jb51.cc/tag/yige/" target="_blank" class="keywords">一个</a>键是<a href="https://www.jb51.cc/tag/biaoqian/" target="_blank" class="keywords">标签</a>，值是<a href="https://www.jb51.cc/tag/biaoqian/" target="_blank" class="keywords">标签</a>中的<a href="https://www.jb51.cc/tag/neirong/" target="_blank" class="keywords">内容</a>的字典。下一步，我们会将这个字典传递给另<a href="https://www.jb51.cc/tag/yige/" target="_blank" class="keywords">一个</a><a href="https://www.jb51.cc/tag/hanshu/" target="_blank" class="keywords">函数</a>，它将解析字典中的<a href="https://www.jb51.cc/tag/neirong/" target="_blank" class="keywords">内容</a>。</p> <p>我们唯一需要编写的SAX的部分是Content Handler。全文如下：</p> <p><a href="https://www.jb51.cc/tag/zaizhe/" target="_blank" class="keywords">在这</a>段<a href="https://www.jb51.cc/tag/daima/" target="_blank" class="keywords">代码</a>中，我们寻找<a href="https://www.jb51.cc/tag/biaoqian/" target="_blank" class="keywords">标签</a>为title和text的<a href="https://www.jb51.cc/tag/biaoqian/" target="_blank" class="keywords">标签</a>。每次解析器遇到其中<a href="https://www.jb51.cc/tag/yige/" target="_blank" class="keywords">一个</a>时，它会将字符保存到缓存中，直到遇到对应的结束<a href="https://www.jb51.cc/tag/biaoqian/" target="_blank" class="keywords">标签</a>（</tag>）。然后它会保存缓存<a href="https://www.jb51.cc/tag/neirong/" target="_blank" class="keywords">内容</a>到字典中-- self._values。<a href="https://www.jb51.cc/tag/wenzhang/" target="_blank" class="keywords">文章</a>由<page><a href="https://www.jb51.cc/tag/biaoqian/" target="_blank" class="keywords">标签</a>区分，如果Content Handler遇到<a href="https://www.jb51.cc/tag/yige/" target="_blank" class="keywords">一个</a>代表结束的 </page> <a href="https://www.jb51.cc/tag/biaoqian/" target="_blank" class="keywords">标签</a>，它将<a href="https://www.jb51.cc/tag/tianjia/" target="_blank" class="keywords">添加</a>self._values 到<a href="https://www.jb51.cc/tag/wenzhangliebiao/" target="_blank" class="keywords">文章列表</a>（self._pages）中。如果感到疑惑了，实践观摩一下可能会有帮助。</p> <p>下面的<a href="https://www.jb51.cc/tag/daima/" target="_blank" class="keywords">代码</a><a href="https://www.jb51.cc/tag/xianshi/" target="_blank" class="keywords">显示</a>了如何通过XML<a href="https://www.jb51.cc/tag/wenjian/" target="_blank" class="keywords">文件</a>查找<a href="https://www.jb51.cc/tag/wenzhang/" target="_blank" class="keywords">文章</a>。现在，我们只是将它们保存到handler._pages中，稍后我们将把<a href="https://www.jb51.cc/tag/wenzhang/" target="_blank" class="keywords">文章</a>发送到另<a href="https://www.jb51.cc/tag/yige/" target="_blank" class="keywords">一个</a><a href="https://www.jb51.cc/tag/hanshu/" target="_blank" class="keywords">函数</a>中进行解析。</p> <pre> # Object for handling xml handler = WikiXmlHandler() # Parsing object parser = xml.sax.make_parser() parser.setContentHandler(handler) # I<a href="https://www.jb51.cc/tag/tera/" target="_blank" class="keywords">tera</a>tively process file for line in subprocess.Popen(['bzcat'],stdout = subprocess.PIPE).stdout: parser.<a href="https://www.jb51.cc/tag/Feed/" target="_blank" class="keywords">Feed</a>(line) <h1>Stop when 3 articles have been found</h1> <p>if len(handler._pages) > 2:<br /> break</p> </pre> <p>如果我们观察 handler._pages，我们将看到<a href="https://www.jb51.cc/tag/yige/" target="_blank" class="keywords">一个</a>列表，其中每个元素都是<a href="https://www.jb51.cc/tag/yige/" target="_blank" class="keywords">一个</a>包含一篇<a href="https://www.jb51.cc/tag/wenzhang/" target="_blank" class="keywords">文章</a>的<a href="https://www.jb51.cc/tag/biaoti/" target="_blank" class="keywords">标题</a>和<a href="https://www.jb51.cc/tag/neirong/" target="_blank" class="keywords">内容</a>的<a href="https://www.jb51.cc/tag/yuanzu/" target="_blank" class="keywords">元组</a>：</p> <pre> handler._pages[0] [('Carroll Knicely',"'''Carroll F. Knicely''' (born c. 1929 in [[Staunton,Kentucky]]) was [[Editing|editor]] and [[Publishing|publisher]] ...)] </pre> <p>此时，我们已经编写的<a href="https://www.jb51.cc/tag/daima/" target="_blank" class="keywords">代码</a>可以成功地识别XML中的<a href="https://www.jb51.cc/tag/wenzhang/" target="_blank" class="keywords">文章</a>。现在我们完成了解析<a href="https://www.jb51.cc/tag/wenjian/" target="_blank" class="keywords">文件</a>一半的任务，下一步是处理<a href="https://www.jb51.cc/tag/wenzhang/" target="_blank" class="keywords">文章</a>以查找特定<a href="https://www.jb51.cc/tag/yemian/" target="_blank" class="keywords">页面</a>和信息。再次，我们使用专为这项工作而创建的<a href="https://www.jb51.cc/tag/yige/" target="_blank" class="keywords">一个</a>工具。</p> <p>解析维基百科<a href="https://www.jb51.cc/tag/wenzhang/" target="_blank" class="keywords">文章</a></p> <p>维基百科运行在<a href="https://www.jb51.cc/tag/yige/" target="_blank" class="keywords">一个</a>叫做Me<a href="https://www.jb51.cc/tag/diawi/" target="_blank" class="keywords">diawi</a>ki的软件上，该软件用来构建wiki。这使<a href="https://www.jb51.cc/tag/wenzhang/" target="_blank" class="keywords">文章</a>遵循一种标准格式，这种格式可以轻易地用编程方式访问其中的信息。虽然一篇<a href="https://www.jb51.cc/tag/wenzhang/" target="_blank" class="keywords">文章</a>的文本看起来可能只是<a href="https://www.jb51.cc/tag/yige/" target="_blank" class="keywords">一个</a>字符串，但由于格式的原因，它实际上编码了更多的信息。为了有效地<a href="https://www.jb51.cc/tag/huoqu/" target="_blank" class="keywords">获取</a>这些信息，我们引进了强大的 mwparserfromhell， <a href="https://www.jb51.cc/tag/yige/" target="_blank" class="keywords">一个</a>为处理Me<a href="https://www.jb51.cc/tag/diawi/" target="_blank" class="keywords">diawi</a>ki<a href="https://www.jb51.cc/tag/neirong/" target="_blank" class="keywords">内容</a>而构建的库。</p> <p>如果我们将维基百科<a href="https://www.jb51.cc/tag/wenzhang/" target="_blank" class="keywords">文章</a>的文本传递给 mwparserfromhell，我们会得到<a href="https://www.jb51.cc/tag/yige/" target="_blank" class="keywords">一个</a>Wikicode 对象，它含有许多对数据进行排序的<a href="https://www.jb51.cc/tag/fangfa/" target="_blank" class="keywords">方法</a>。例如，以下<a href="https://www.jb51.cc/tag/daima/" target="_blank" class="keywords">代码</a>从<a href="https://www.jb51.cc/tag/wenzhang/" target="_blank" class="keywords">文章</a>创建了<a href="https://www.jb51.cc/tag/yige/" target="_blank" class="keywords">一个</a>wikicode对象，并检索<a href="https://www.jb51.cc/tag/wenzhang/" target="_blank" class="keywords">文章</a>中的 wikilinks()。这些<a href="https://www.jb51.cc/tag/lianjie/" target="_blank" class="keywords">链接</a>指向维基百科的其他<a href="https://www.jb51.cc/tag/wenzhang/" target="_blank" class="keywords">文章</a>：</p> <pre> import mwparserfromhell # Create the wiki article wiki = mwparserfromhell.parse(handler._pages[6][1]) # Find the wikilinks wikilinks = [x.title for x in wiki.filter_wikilinks()] wikilinks[:5] ['Provo,Utah','Wasatch Front','Megahertz','Contemporary hit ra<a href="https://www.jb51.cc/tag/dio/" target="_blank" class="keywords">dio</a>','watt'] </pre> <p>有许多有用的<a href="https://www.jb51.cc/tag/fangfa/" target="_blank" class="keywords">方法</a>可以应用于wikicode，例如查找注释或<a href="https://www.jb51.cc/tag/sousuo/" target="_blank" class="keywords">搜索</a>特定的关键字。如果您想获得<a href="https://www.jb51.cc/tag/wenzhang/" target="_blank" class="keywords">文章</a>文本的最终修订版本，可以<a href="https://www.jb51.cc/tag/diaoyong/" target="_blank" class="keywords">调用</a>:</p> <pre> wiki.strip_code().strip() 'KENZ (94.9 FM," Power 94.9 " ) is a top 40/CHR ra<a href="https://www.jb51.cc/tag/dio/" target="_blank" class="keywords">dio</a> station <a href="https://www.jb51.cc/tag/bro/" target="_blank" class="keywords">bro</a>adcasting to Salt Lake City,Utah ' </pre> <p>因为我的最终目标是找到所有关于书籍的<a href="https://www.jb51.cc/tag/wenzhang/" target="_blank" class="keywords">文章</a>，那么是否有一种<a href="https://www.jb51.cc/tag/fangfa/" target="_blank" class="keywords">方法</a>可以使用解析器来识别某个类别中的<a href="https://www.jb51.cc/tag/wenzhang/" target="_blank" class="keywords">文章</a>呢？幸运的是，答案是肯定的——使用Me<a href="https://www.jb51.cc/tag/diawi/" target="_blank" class="keywords">diawi</a>ki templates。</p> <p><a href="https://www.jb51.cc/tag/wenzhang/" target="_blank" class="keywords">文章</a>模板</p> <p>模板(templates)是记录信息的标准<a href="https://www.jb51.cc/tag/fangfa/" target="_blank" class="keywords">方法</a>。维基百科上有无数的模板，但与我们的目的最相关的是信息框（ Info<a href="https://www.jb51.cc/tag/Box/" target="_blank" class="keywords">Box</a>es）。有些模板编码<a href="https://www.jb51.cc/tag/wenzhang/" target="_blank" class="keywords">文章</a>的<a href="https://www.jb51.cc/tag/zhaiyao/" target="_blank" class="keywords">摘要</a>信息。例如，战争与和平的信息框是：</p> <p><p class="pic_center"><img alt="手把手教你用Python读懂全球最大百科全书！基维百科！" class="has" src="https://www.jb51.cc/res/2018/12-31/09/dd8f124bb504fc9ee6c736dbb3b0715f.jpg" /></p></p> <p>维基百科上的每一类<a href="https://www.jb51.cc/tag/wenzhang/" target="_blank" class="keywords">文章</a>，如电影、书籍或广播电台，都有自己的信息框。在书籍的例子中，信息框模板被命名为Info<a href="https://www.jb51.cc/tag/Box/" target="_blank" class="keywords">Box</a> book。同样，wiki对象有<a href="https://www.jb51.cc/tag/yige/" target="_blank" class="keywords">一个</a>名为filter_templates()的<a href="https://www.jb51.cc/tag/fangfa/" target="_blank" class="keywords">方法</a>，它允许我们从一篇<a href="https://www.jb51.cc/tag/wenzhang/" target="_blank" class="keywords">文章</a>中<a href="https://www.jb51.cc/tag/tiqu/" target="_blank" class="keywords">提取</a>特定的模板。因此，如果我们想知道一篇<a href="https://www.jb51.cc/tag/wenzhang/" target="_blank" class="keywords">文章</a>是否是关于一本书的，我们可以通过book信息框去过滤。展示如下：</p> <pre> # Filter article for book template wiki.filter_templates('Info<a href="https://www.jb51.cc/tag/Box/" target="_blank" class="keywords">Box</a> book') </pre> <p>如果匹配成功，那我们就找到一本书了！要查找你感兴趣的<a href="https://www.jb51.cc/tag/wenzhang/" target="_blank" class="keywords">文章</a>类别的信息框模板，请参阅信息框列表。</p> <p>如何将用于解析<a href="https://www.jb51.cc/tag/wenzhang/" target="_blank" class="keywords">文章</a>的mwparserfromhell 与我们编写的SAX解析器结合起来？我们<a href="https://www.jb51.cc/tag/xiugai/" target="_blank" class="keywords">修改</a>了Content Handler中的endElement<a href="https://www.jb51.cc/tag/fangfa/" target="_blank" class="keywords">方法</a>，将包含<a href="https://www.jb51.cc/tag/wenzhang/" target="_blank" class="keywords">文章</a><a href="https://www.jb51.cc/tag/biaoti/" target="_blank" class="keywords">标题</a>和文本的值的字典，发送到通过指定模板<a href="https://www.jb51.cc/tag/sousuo/" target="_blank" class="keywords">搜索</a><a href="https://www.jb51.cc/tag/wenzhang/" target="_blank" class="keywords">文章</a>文本的<a href="https://www.jb51.cc/tag/hanshu/" target="_blank" class="keywords">函数</a>中。如果<a href="https://www.jb51.cc/tag/hanshu/" target="_blank" class="keywords">函数</a>找到了我们想要的<a href="https://www.jb51.cc/tag/wenzhang/" target="_blank" class="keywords">文章</a>，它会从<a href="https://www.jb51.cc/tag/wenzhang/" target="_blank" class="keywords">文章</a>中<a href="https://www.jb51.cc/tag/tiqu/" target="_blank" class="keywords">提取</a>信息，然后返回给handler。首先，我将展示更新后的endElement 。</p> <pre> def endElement(self,name): """Closing tag of element""" if name == self._current_tag: self._values[name] = ' '.join(self._buffer) if name == 'page': self._article_count += 1 # Send the page to the process article function book = process_article(**self._values,template = 'Info<a href="https://www.jb51.cc/tag/Box/" target="_blank" class="keywords">Box</a> book') # If article is a book append to the list of books if book: self._books.append(book) </pre> <p>一旦解析器到达<a href="https://www.jb51.cc/tag/wenzhang/" target="_blank" class="keywords">文章</a>的末尾，我们将<a href="https://www.jb51.cc/tag/wenzhang/" target="_blank" class="keywords">文章</a>传递到<a href="https://www.jb51.cc/tag/hanshu/" target="_blank" class="keywords">函数</a> process_article，如下所示：</p> <pre> def process_article(title,text,timestamp,template = 'Info<a href="https://www.jb51.cc/tag/Box/" target="_blank" class="keywords">Box</a> book'): """Process a wikipedia article looking for template""" <h1>Create a parsing object</h1> <p>wikicode = mwparserfromhell.parse(text)</p> <h1>Search through templates for the template</h1> <p>matches = wikicode.filter_templates(matches = template)<br /> if len(matches) >= 1:</p> <h1>Extract <a href="https://www.jb51.cc/tag/informat/" target="_blank" class="keywords">informat</a>ion from info<a href="https://www.jb51.cc/tag/Box/" target="_blank" class="keywords">Box</a></h1> <p>properties = {p<a href="https://www.jb51.cc/tag/ara/" target="_blank" class="keywords">ara</a>m.name.strip_code().strip(): p<a href="https://www.jb51.cc/tag/ara/" target="_blank" class="keywords">ara</a>m.value.strip_code().strip()<br /> for p<a href="https://www.jb51.cc/tag/ara/" target="_blank" class="keywords">ara</a>m in matches[0].p<a href="https://www.jb51.cc/tag/ara/" target="_blank" class="keywords">ara</a>ms<br /> if p<a href="https://www.jb51.cc/tag/ara/" target="_blank" class="keywords">ara</a>m.value.strip_code().strip()}</p> <h1>Extract internal wikilinks</h1> </pre> <p>虽然我正在寻找有关书籍的<a href="https://www.jb51.cc/tag/wenzhang/" target="_blank" class="keywords">文章</a>，但是这个<a href="https://www.jb51.cc/tag/hanshu/" target="_blank" class="keywords">函数</a>可以用来<a href="https://www.jb51.cc/tag/sousuo/" target="_blank" class="keywords">搜索</a>维基百科上任何类别的<a href="https://www.jb51.cc/tag/wenzhang/" target="_blank" class="keywords">文章</a>。只需将模板替换为指定类别的模板(例如 Info<a href="https://www.jb51.cc/tag/Box/" target="_blank" class="keywords">Box</a> language是用来寻找语言的)，它只会返回符合条件的<a href="https://www.jb51.cc/tag/wenzhang/" target="_blank" class="keywords">文章</a>信息。</p> <p>我们可以在<a href="https://www.jb51.cc/tag/yige/" target="_blank" class="keywords">一个</a><a href="https://www.jb51.cc/tag/wenjian/" target="_blank" class="keywords">文件</a>上测试这个<a href="https://www.jb51.cc/tag/hanshu/" target="_blank" class="keywords">函数</a>和新的ContentHandler 。</p> <pre> Searched through 427481 articles. Found 1426 books in 1055 seconds. </pre> <p>让我们看一下查找一本书的结果：</p> <pre> books[10] ['War and Peace',{'name': 'War and Peace','author': 'Leo Tolstoy','language': 'Russian,with some <a href="https://www.jb51.cc/tag/french/" target="_blank" class="keywords">french</a>','country': 'Russia','genre': 'Novel (Historical novel)','publisher': 'The Russian Messenger (serial)','title_orig': 'Война и миръ','orig_lang_code': 'ru','translator': 'The f<a href="https://www.jb51.cc/tag/irs/" target="_blank" class="keywords">irs</a>t translation of War and Peace into English was by American Nathan Haskell Dole,in 1899','image': 'Tolstoy - War and Peace - f<a href="https://www.jb51.cc/tag/irs/" target="_blank" class="keywords">irs</a>t edition,1869.jpg','caption': 'Front page of War and Peace,f<a href="https://www.jb51.cc/tag/irs/" target="_blank" class="keywords">irs</a>t edition,1869 (Russian)','release_date': 'Serialised 1865–1867; book 1869','media_type': 'Print','pages': '1,225 (f<a href="https://www.jb51.cc/tag/irs/" target="_blank" class="keywords">irs</a>t published edition)'},['Leo Tolstoy','Novel','Historical novel','The Russian Messenger','Serial (publishing)','Category:1869 Russian novels','Category:Epic novels','Category:Novels set in 19th-century Russia','Category:Russian novels adapted into films','Category:Russian philosophical novels'],['https://books.google.com/?id=c4HEAN-ti1MC','https://www.britannica.com/art/English-li<a href="https://www.jb51.cc/tag/tera/" target="_blank" class="keywords">tera</a>ture','https://books.google.com/books?id=xf7umXHGDPcC','https://books.google.com/?id=E5fotqsglPEC','https://books.google.com/?id=9sHebfZIXFAC'],'2018-08-29T02:37:35Z'] </pre> <p>对于维基百科上的每一本书，我们把信息框中的信息整理为字典、书籍在维基百科中的wikilinks信息、书籍的外部<a href="https://www.jb51.cc/tag/lianjie/" target="_blank" class="keywords">链接</a>和最新编辑的时间戳。(我把精力集中<a href="https://www.jb51.cc/tag/zaizhe/" target="_blank" class="keywords">在这</a>些信息上，为我的下<a href="https://www.jb51.cc/tag/yige/" target="_blank" class="keywords">一个</a>项目建立<a href="https://www.jb51.cc/tag/yige/" target="_blank" class="keywords">一个</a>图书推荐系统)。你可以<a href="https://www.jb51.cc/tag/xiugai/" target="_blank" class="keywords">修改</a>process_article <a href="https://www.jb51.cc/tag/hanshu/" target="_blank" class="keywords">函数</a>和WikiXmlHandler类，以查找任何你需要的信息和<a href="https://www.jb51.cc/tag/wenzhang/" target="_blank" class="keywords">文章</a>！</p> <p>如果你看一下只处理<a href="https://www.jb51.cc/tag/yige/" target="_blank" class="keywords">一个</a><a href="https://www.jb51.cc/tag/wenjian/" target="_blank" class="keywords">文件</a>的时间，1055秒，然后乘以55，你会发现处理所有<a href="https://www.jb51.cc/tag/wenjian/" target="_blank" class="keywords">文件</a>的时间超过了15个小时！当然，我们可以在一夜之间运行，但如果可以的话，我不想浪费额外的时间。这就引出了我们将在本项目中介绍的最后一种技术：使用多处理和多线程进行并行化。</p> <p><strong>并行操作</strong></p> <p>与其一次<a href="https://www.jb51.cc/tag/yige/" target="_blank" class="keywords">一个</a>解析<a href="https://www.jb51.cc/tag/wenjian/" target="_blank" class="keywords">文件</a>，不如同时处理其中的几个(这就是我们下载分区的原因)。我们可以使用并行化，通过多线程或多处理来实现。</p> <p><strong>多线程与多处理</strong></p> <p>多线程和多处理是同时在计算机或多台计算机上执行许多任务的<a href="https://www.jb51.cc/tag/fangfa/" target="_blank" class="keywords">方法</a>。我们磁盘上有许多<a href="https://www.jb51.cc/tag/wenjian/" target="_blank" class="keywords">文件</a>，每个<a href="https://www.jb51.cc/tag/wenjian/" target="_blank" class="keywords">文件</a>都需要以相同的方式进行解析。<a href="https://www.jb51.cc/tag/yige/" target="_blank" class="keywords">一个</a>简单的<a href="https://www.jb51.cc/tag/fangfa/" target="_blank" class="keywords">方法</a>是一次解析<a href="https://www.jb51.cc/tag/yige/" target="_blank" class="keywords">一个</a><a href="https://www.jb51.cc/tag/wenjian/" target="_blank" class="keywords">文件</a>，但这并没有充分利用我们的资源。因此，我们可以使用多线程或多处理同时解析多个<a href="https://www.jb51.cc/tag/wenjian/" target="_blank" class="keywords">文件</a>，这将大大加快整个过程。</p> <p>通常，多线程对于输入/<a href="https://www.jb51.cc/tag/shuchu/" target="_blank" class="keywords">输出</a>绑定任务（例如读取<a href="https://www.jb51.cc/tag/wenjian/" target="_blank" class="keywords">文件</a>或发出请求）更好（更快）。多处理对于<a href="https://www.jb51.cc/tag/cpu/" target="_blank" class="keywords">cpu</a>密集型任务更好（更快）。对于解析<a href="https://www.jb51.cc/tag/wenzhang/" target="_blank" class="keywords">文章</a>的过程，我不确定哪种<a href="https://www.jb51.cc/tag/fangfa/" target="_blank" class="keywords">方法</a>是最优的，因此我再次用不同的参数对这两种<a href="https://www.jb51.cc/tag/fangfa/" target="_blank" class="keywords">方法</a>进行了基准测试。</p> <p>学习如何进行测试和寻找不同的<a href="https://www.jb51.cc/tag/fangfa/" target="_blank" class="keywords">方法</a>来<a href="https://www.jb51.cc/tag/jiejue/" target="_blank" class="keywords">解决</a><a href="https://www.jb51.cc/tag/yige/" target="_blank" class="keywords">一个</a>问题，你将会在数据科学或任何技术的职业生涯中走得更远。</p><p class="text-muted" style="margin-top:20px;">版权声明：本文内容由互联网用户自发贡献，该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容，请发送邮件至 dio@foxmail.com 举报，一经查实，本站将立刻删除。</p> </div></div> </div> </div> <div class="row row-sm"> <div class="col-sm-12 col-md-12 col-lg-12"> <div class="card"> <h3>相关推荐</h3> <hr /> <div class="list_con"><div class="title"><a href="https://www.jb51.cc/python/4739206.html" title="人脸识别项目打包成exe的过程遇到的问题">人脸识别项目打包成exe的过程遇到的问题</a></div> <div class="summary">我最近重新拾起了计算机视觉，借助Python的opencv还有face_recognition库写了个简单的图像识别demo，额外定制了一些内容，原本想打包成exe然后发给朋友，不过在这当中遇到了许多小问题，都解决了，记录一下踩过的坑。 1、Pyinstaller打包过程当中出现warning，跟d</div> </div><div class="list_con"><img src="https://www.jb51.cc/res/2024/07-31/10/d477d9768d867704e6c42124c05954d9.png" title="" width="160" height="90" style="float:right;" /><div class="title"><a href="https://www.jb51.cc/python/4739205.html" title="Pooling与马赛克的秘密">Pooling与马赛克的秘密</a></div> <div class="summary">说到Pooling，相信学习过CNN的朋友们都不会感到陌生。Pooling在中文当中的意思是“池化”，在神经网络当中非常常见，通常用的比较多的一种是Max Pooling，具体操作如下图：结合图像理解，相信你也会大概明白其中的本意。不过Pooling并不是只可以选取2x2的窗口大小，即便是3x3，</div> </div><div class="list_con"><img src="https://www.jb51.cc/res/2024/07-31/10/387861d71ca11b1e71a2fe5a0436696d.png" title="" width="160" height="90" style="float:right;" /><div class="title"><a href="https://www.jb51.cc/python/4739204.html" title="判断整数和复数的奇技淫巧">判断整数和复数的奇技淫巧</a></div> <div class="summary">记得大一学Python的时候，有一个题目是判断一个数是否是复数。当时觉得比较复杂不好写，就琢磨了一个偷懒的好办法，用异常处理的手段便可以大大程度帮助你简短代码（偷懒）。以下是判断整数和复数的两段小代码：相信看到这里，你也有所顿悟，能拓展出更多有意思的方法~</div> </div><div class="list_con"><img src="https://www.jb51.cc/res/2024/07-31/10/f30be107b1802677a9a175315a09d660.jpg" title="" width="160" height="90" style="float:right;" /><div class="title"><a href="https://www.jb51.cc/python/4739203.html" title="[seaborn] seaborn学习笔记3-直方图Histogramplot">[seaborn] seaborn学习笔记3-直方图Histogramplot</a></div> <div class="summary">文章目录 3 直方图Histogramplot1. 基本直方图的绘制 Basic histogram2. 数据分布与密度信息显示 Control rug and density on seaborn histogram3. 带箱形图的直方图 Histogram with a boxplot on t</div> </div><div class="list_con"><img src="https://www.jb51.cc/res/2024/07-31/10/d8aee25dda73b7fb52e3a99e1852c76e.jpg" title="" width="160" height="90" style="float:right;" /><div class="title"><a href="https://www.jb51.cc/python/4739202.html" title="[seaborn] seaborn学习笔记5-小提琴图VIOLINPLOT">[seaborn] seaborn学习笔记5-小提琴图VIOLINPLOT</a></div> <div class="summary">文章目录 5 小提琴图Violinplot1. 基础小提琴图绘制 Basic violinplot2. 小提琴图样式自定义 Custom seaborn violinplot3. 小提琴图颜色自定义 Control color of seaborn violinplot4. 分组小提琴图 Group</div> </div><div class="list_con"><img src="https://www.jb51.cc/res/2024/07-31/10/9c4f0320f8bcee2035c9d807eeaa2898.jpg" title="" width="160" height="90" style="float:right;" /><div class="title"><a href="https://www.jb51.cc/python/4739201.html" title="[seaborn] seaborn学习笔记4-核密度图DENSITYPLOT">[seaborn] seaborn学习笔记4-核密度图DENSITYPLOT</a></div> <div class="summary">文章目录 4 核密度图Densityplot1. 基础核密度图绘制 Basic density plot2. 核密度图的区间控制 Control bandwidth of density plot3. 多个变量的核密度图绘制 Density plot of several variables4. 边</div> </div><div class="list_con"><div class="title"><a href="https://www.jb51.cc/python/4739200.html" title="[python] tensorflow中的argmax()函数argmax()函数">[python] tensorflow中的argmax()函数argmax()函数</a></div> <div class="summary">首先 import tensorflow as tf tf.argmax(tenso,n)函数会返回tensor中参数指定的维度中的最大值的索引或者向量。当tensor为矩阵返回向量，tensor为向量返回索引号。其中n表示具体参数的维度。以实际例子为说明： import tensorflow a</div> </div><div class="list_con"><div class="title"><a href="https://www.jb51.cc/python/4739199.html" title="[seaborn] seaborn学习笔记0-seaborn学习笔记章节">[seaborn] seaborn学习笔记0-seaborn学习笔记章节</a></div> <div class="summary">seaborn学习笔记章节 seaborn是一个基于matplotlib的Python数据可视化库。seaborn是matplotlib的高级封装，可以绘制有吸引力且信息丰富的统计图形。相对于matplotlib，seaborn语法更简洁，两者关系类似于numpy和pandas之间的关系，seabo</div> </div><div class="list_con"><div class="title"><a href="https://www.jb51.cc/python/4739198.html" title="[编程基础] Python配置文件读取库ConfigParser总结">[编程基础] Python配置文件读取库ConfigParser总结</a></div> <div class="summary">Python ConfigParser教程显示了如何使用ConfigParser在Python中使用配置文件。文章目录 1 介绍1.1 Python ConfigParser读取文件1.2 Python ConfigParser中的节1.3 Python ConfigParser从字符串中读取数据</div> </div><div class="list_con"><div class="title"><a href="https://www.jb51.cc/python/4739197.html" title="[python]《Python编程快速上手:让繁琐工作自动化》学习笔记4">[python]《Python编程快速上手:让繁琐工作自动化》学习笔记4</a></div> <div class="summary">1. 处理Excel 电子表格笔记（第12章)(代码下载) 本文主要介绍openpyxl 的2.5.12版处理excel电子表格，原书是2.1.4 版，OpenPyXL 团队会经常发布新版本。不过不用担心，新版本应该在相当长的时间内向后兼容。如果你有新版本，想看看它提供了什么新功能，可以查看Open</div> </div><div class="list_con"><img src="https://www.jb51.cc/res/2024/07-31/10/14b5f49f4df5a925ade4f9065d780b9b.jpg" title="" width="160" height="90" style="float:right;" /><div class="title"><a href="https://www.jb51.cc/python/4739196.html" title="[python]《Python编程快速上手:让繁琐工作自动化》学习笔记6">[python]《Python编程快速上手:让繁琐工作自动化》学习笔记6</a></div> <div class="summary">1. 发送电子邮件和短信笔记（第16章）(代码下载) 1.1 发送电子邮件简单邮件传输协议（SMTP）是用于发送电子邮件的协议。SMTP 规定电子邮件应该如何格式化、加密、在邮件服务器之间传递，以及在你点击发送后，计算机要处理的所有其他细节。。但是，你并不需要知道这些技术细节，因为Python 的</div> </div><div class="list_con"><img src="https://www.jb51.cc/res/2024/07-31/10/d16777a6518db038aa6bbb48df26de5a.jpg" title="" width="160" height="90" style="float:right;" /><div class="title"><a href="https://www.jb51.cc/python/4739195.html" title="[seaborn] seaborn学习笔记12-绘图实例(4) Drawing example(4)">[seaborn] seaborn学习笔记12-绘图实例(4) Drawing example(4)</a></div> <div class="summary">文章目录 12 绘图实例(4) Drawing example(4)1. Scatterplot with varying point sizes and hues(relplot)2. Scatterplot with categorical variables(swarmplot)3. Scat</div> </div><div class="list_con"><img src="https://www.jb51.cc/res/2024/07-31/10/b3d3b8e8e4c03ba829d977c70c64e0fe.jpg" title="" width="160" height="90" style="float:right;" /><div class="title"><a href="https://www.jb51.cc/python/4739194.html" title="[seaborn] seaborn学习笔记10-绘图实例(2) Drawing example(2)">[seaborn] seaborn学习笔记10-绘图实例(2) Drawing example(2)</a></div> <div class="summary">文章目录 10 绘图实例(2) Drawing example(2)1. Grouped violinplots with split violins(violinplot)2. Annotated heatmaps(heatmap)3. Hexbin plot with marginal dist</div> </div><div class="list_con"><img src="https://www.jb51.cc/res/2024/07-31/10/c355c9d451a4084abb235ec221360ac5.jpg" title="" width="160" height="90" style="float:right;" /><div class="title"><a href="https://www.jb51.cc/python/4739193.html" title="[seaborn] seaborn学习笔记9-绘图实例(1) Drawing example(1)">[seaborn] seaborn学习笔记9-绘图实例(1) Drawing example(1)</a></div> <div class="summary">文章目录 9 绘图实例(1) Drawing example(1)1. Anscombe’s quartet(lmplot)2. Color palette choices(barplot)3. Different cubehelix palettes(kdeplot)4. Distribution</div> </div><div class="list_con"><div class="title"><a href="https://www.jb51.cc/python/4739192.html" title="[编程基础] Python装饰器入门总结">[编程基础] Python装饰器入门总结</a></div> <div class="summary">Python装饰器教程展示了如何在Python中使用装饰器基本功能。文章目录 1 使用教程1.1 Python装饰器简单示例1.2 带@符号的Python装饰器1.3 用参数修饰函数1.4 Python装饰器修改数据1.5 Python多层装饰器1.6 Python装饰器计时示例 2 参考 1 使</div> </div><div class="list_con"><div class="title"><a href="https://www.jb51.cc/python/4739191.html" title="[python]《Python编程快速上手:让繁琐工作自动化》学习笔记7">[python]《Python编程快速上手:让繁琐工作自动化》学习笔记7</a></div> <div class="summary">1. 用GUI 自动化控制键盘和鼠标第18章 (代码下载) pyautogui模块可以向Windows、OS X 和Linux 发送虚拟按键和鼠标点击。根据使用的操作系统，在安装pyautogui之前，可能需要安装一些其他模块。 Windows: 不需要安装其他模块。OS X: sudo pip3</div> </div><div class="list_con"><div class="title"><a href="https://www.jb51.cc/python/4739190.html" title="[python] 个人日常python工具代码">[python] 个人日常python工具代码</a></div> <div class="summary">文章目录生成文件目录结构多图合并找出文件夹中相似图像生成文件目录结构生成文件夹或文件的目录结构，并保存结果。可选是否滤除目录，特定文件以及可以设定最大查找文件结构深度。效果如下： root:[z:/] |--a.py |--image | |--cat1.jpg | |--cat2.jpg |</div> </div><div class="list_con"><img src="https://www.jb51.cc/res/2024/07-31/10/00deee227bd3ede82744fa99db33ff46.jpg" title="" width="160" height="90" style="float:right;" /><div class="title"><a href="https://www.jb51.cc/python/4739189.html" title="[python] 基于matplotlib_venn实现维恩图的绘制">[python] 基于matplotlib_venn实现维恩图的绘制</a></div> <div class="summary">文章目录 VENN DIAGRAM(维恩图)1. 具有2个分组的基本的维恩图 Venn diagram with 2 groups2. 具有3个组的基本维恩图 Venn diagram with 3 groups3. 自定义维恩图 Custom Venn diagram4. 精致的维恩图 Elabo</div> </div><div class="list_con"><img src="https://www.jb51.cc/res/2024/07-31/10/6230348722ed50385381eeea0eaa0cd3.jpg" title="" width="160" height="90" style="float:right;" /><div class="title"><a href="https://www.jb51.cc/python/4739188.html" title="[python] mxnet60分钟入门Gluon教程">[python] mxnet60分钟入门Gluon教程</a></div> <div class="summary">mxnet60分钟入门Gluon教程代码下载，适合做过深度学习的人使用。入门教程地址： https://beta.mxnet.io/guide/getting-started/crash-course/index.html mxnet安装方法：pip install mxnet 1 在mxnet中使</div> </div><div class="list_con"><div class="title"><a href="https://www.jb51.cc/python/4739187.html" title="[python] python模块graphviz使用入门">[python] python模块graphviz使用入门</a></div> <div class="summary">文章目录 1 安装2 快速入门2.1 基本用法2.2 输出图像格式2.3 图像style设置2.4 属性2.5 子图和聚类 3 实例4 如何进一步使用python graphviz Graphviz是一款能够自动排版的流程图绘图软件。python graphviz则是graphviz的python实</div> </div></div> </div> </div> </div> <div class="col-sm-12 col-md-12 col-lg-3"> <div class="row row-sm"> <div class="col-sm-12 col-md-12 col-lg-12"> <div class="card"> <label class="main-content-label ">热门文章</label> <ul class="n-list"><li><a href="https://www.jb51.cc/python/4739206.html" title="人脸识别项目打包成exe的过程遇到的问题">• 人脸识别项目打包成exe的过程遇到的问题</a></li><li><a href="https://www.jb51.cc/python/4739205.html" title="Pooling与马赛克的秘密">• Pooling与马赛克的秘密</a></li><li><a href="https://www.jb51.cc/python/4739204.html" title="判断整数和复数的奇技淫巧">• 判断整数和复数的奇技淫巧</a></li><li><a href="https://www.jb51.cc/python/4739203.html" title="[seaborn] seaborn学习笔记3-直方图Histogramplot">• [seaborn] seaborn学习笔记3-直方图His…</a></li><li><a href="https://www.jb51.cc/python/4739202.html" title="[seaborn] seaborn学习笔记5-小提琴图VIOLINPLOT">• [seaborn] seaborn学习笔记5-小提琴图V…</a></li><li><a href="https://www.jb51.cc/python/4739201.html" title="[seaborn] seaborn学习笔记4-核密度图DENSITYPLOT">• [seaborn] seaborn学习笔记4-核密度图D…</a></li><li><a href="https://www.jb51.cc/python/4739200.html" title="[python] tensorflow中的argmax()函数argmax()函数">• [python] tensorflow中的argmax()函数a…</a></li><li><a href="https://www.jb51.cc/python/4739199.html" title="[seaborn] seaborn学习笔记0-seaborn学习笔记章节">• [seaborn] seaborn学习笔记0-seaborn学…</a></li><li><a href="https://www.jb51.cc/python/4739198.html" title="[编程基础] Python配置文件读取库ConfigParser总结">• [编程基础] Python配置文件读取库Confi…</a></li><li><a href="https://www.jb51.cc/python/4739197.html" title="[python]《Python编程快速上手:让繁琐工作自动化》学习笔记4">• [python]《Python编程快速上手:让繁琐工…</a></li></ul> </div> </div> </div>  <div class="row row-sm"> <div class="col-sm-12 col-md-12 col-lg-12"> <div class="card">  <ins class="adsbygoogle" style="display:inline-block;width:300px;height:600px" data-ad-client="ca-pub-4605373693034661" data-ad-slot="7541177540"></ins> <script> (adsbygoogle = window.adsbygoogle || []).push({}); </script> </div> </div> </div>  <div class="row row-sm"> <div class="col-sm-12 col-md-12 col-lg-12"> <div class="card"> <label class="main-content-label ">最新文章</label> <ul class="n-list"><li><a href="https://www.jb51.cc/python/4739206.html" title="人脸识别项目打包成exe的过程遇到的问题">• 人脸识别项目打包成exe的过程遇到的问题</a></li><li><a href="https://www.jb51.cc/python/4739205.html" title="Pooling与马赛克的秘密">• Pooling与马赛克的秘密</a></li><li><a href="https://www.jb51.cc/python/4739204.html" title="判断整数和复数的奇技淫巧">• 判断整数和复数的奇技淫巧</a></li><li><a href="https://www.jb51.cc/python/4739203.html" title="[seaborn] seaborn学习笔记3-直方图Histogramplot">• [seaborn] seaborn学习笔记3-直方图His…</a></li><li><a href="https://www.jb51.cc/python/4739202.html" title="[seaborn] seaborn学习笔记5-小提琴图VIOLINPLOT">• [seaborn] seaborn学习笔记5-小提琴图V…</a></li><li><a href="https://www.jb51.cc/python/4739201.html" title="[seaborn] seaborn学习笔记4-核密度图DENSITYPLOT">• [seaborn] seaborn学习笔记4-核密度图D…</a></li><li><a href="https://www.jb51.cc/python/4739200.html" title="[python] tensorflow中的argmax()函数argmax()函数">• [python] tensorflow中的argmax()函数a…</a></li><li><a href="https://www.jb51.cc/python/4739199.html" title="[seaborn] seaborn学习笔记0-seaborn学习笔记章节">• [seaborn] seaborn学习笔记0-seaborn学…</a></li><li><a href="https://www.jb51.cc/python/4739198.html" title="[编程基础] Python配置文件读取库ConfigParser总结">• [编程基础] Python配置文件读取库Confi…</a></li><li><a href="https://www.jb51.cc/python/4739197.html" title="[python]《Python编程快速上手:让繁琐工作自动化》学习笔记4">• [python]《Python编程快速上手:让繁琐工…</a></li></ul> </div> </div> </div> <div class="row row-sm"> <div class="col-sm-12 col-md-12 col-lg-12"> <div class="card"> <label class="main-content-label ">热门标签<a href="https://www.jb51.cc/all" class="pull-right">更多</a> </label> <div class="topcard-tags"><a href="https://www.jb51.cc/tag/python/" title="python">python</a><a href="https://www.jb51.cc/tag/JavaScript/" title="JavaScript">JavaScript</a><a href="https://www.jb51.cc/tag/java/" title="java">java</a><a href="https://www.jb51.cc/tag/HTML/" title="HTML">HTML</a><a href="https://www.jb51.cc/tag/PHP/" title="PHP">PHP</a><a href="https://www.jb51.cc/tag/reactjs/" title="reactjs">reactjs</a><a href="https://www.jb51.cc/tag/C/" title="C#">C#</a><a href="https://www.jb51.cc/tag/Android/" title="Android">Android</a><a href="https://www.jb51.cc/tag/CSS/" title="CSS">CSS</a><a href="https://www.jb51.cc/tag/Nodejs/" title="Node.js">Node.js</a><a href="https://www.jb51.cc/tag/sql/" title="sql">sql</a><a href="https://www.jb51.cc/tag/rp/" title="r">r</a><a href="https://www.jb51.cc/tag/python3x/" title="python-3.x">python-3.x</a><a href="https://www.jb51.cc/tag/MysqL/" title="MysqL">MysqL</a><a href="https://www.jb51.cc/tag/jQuery/" title="jQuery">jQuery</a><a href="https://www.jb51.cc/tag/c4343/" title="c++">c++</a><a href="https://www.jb51.cc/tag/pandas/" title="pandas">pandas</a><a href="https://www.jb51.cc/tag/flutter/" title="Flutter">Flutter</a><a href="https://www.jb51.cc/tag/angular/" title="angular">angular</a><a href="https://www.jb51.cc/tag/IOS/" title="IOS">IOS</a><a href="https://www.jb51.cc/tag/django/" title="django">django</a><a href="https://www.jb51.cc/tag/linux/" title="linux">linux</a><a href="https://www.jb51.cc/tag/swift/" title="swift">swift</a><a href="https://www.jb51.cc/tag/typescript/" title="typescript">typescript</a><a href="https://www.jb51.cc/tag/luyouqi/" title="路由器">路由器</a><a href="https://www.jb51.cc/tag/JSON/" title="JSON">JSON</a><a href="https://www.jb51.cc/tag/luyouqishezhi/" title="路由器设置">路由器设置</a><a href="https://www.jb51.cc/tag/wuxianluyouqi/" title="无线路由器">无线路由器</a><a href="https://www.jb51.cc/tag/h3c/" title="h3c">h3c</a><a href="https://www.jb51.cc/tag/huasan/" title="华三">华三</a><a href="https://www.jb51.cc/tag/huasanluyouqishezhi/" title="华三路由器设置">华三路由器设置</a><a href="https://www.jb51.cc/tag/huasanluyouqi/" title="华三路由器">华三路由器</a><a href="https://www.jb51.cc/tag/diannaoruanjianjiaocheng/" title="电脑软件教程">电脑软件教程</a><a href="https://www.jb51.cc/tag/arrays/" title="arrays">arrays</a><a href="https://www.jb51.cc/tag/docker/" title="docker">docker</a><a href="https://www.jb51.cc/tag/ruanjiantuwenjiaocheng/" title="软件图文教程">软件图文教程</a><a href="https://www.jb51.cc/tag/C/" title="C">C</a><a href="https://www.jb51.cc/tag/vuejs/" title="vue.js">vue.js</a><a href="https://www.jb51.cc/tag/laravel/" title="laravel">laravel</a><a href="https://www.jb51.cc/tag/springboot/" title="spring-boot">spring-boot</a></div> </div> </div> </div> <div class="row row-sm rbox"> <div class="col-sm-12 col-md-12 col-lg-12"> <div class="card">  <ins class="adsbygoogle" style="display:inline-block;width:300px;height:600px" data-ad-client="ca-pub-4605373693034661" data-ad-slot="7541177540"></ins> <script> (adsbygoogle = window.adsbygoogle || []).push({}); </script> </div> </div> </div> </div> </div> </div> <footer id="footer"> <div class="container" style="width:1440px;"> <div class="row hidden-xs"> <div class="col-sm-12 col-md-9 col-lg-9 site-link"> <ul class="list-inline"> <li>友情链接:</li><li><a href="https://www.f2er.com/" title="前端之家(F2ER.COM)提供了前端开发编程的基础技术教程, 介绍了html、css、javascript、jquery、vue、bootstrap、node、angular等各种前端开发编程语言的基础知识。" target="_blank" rel="nofollow">前端之家</a></li><li><a href="https://ai.jb51.cc/" title="ai导航是编程之家旗下ai方向的ai资讯、ai工具类集合导航站。" target="_blank" rel="nofollow">ai导航</a></li></ul> <ul class="list-inline"> <li><a href="https://www.jb51.cc" title="编程之家">编程之家</a></li>-<li><a href="https://t5m44pq3f7.jiandaoyun.com/f/638ca61b7b079a000a5d2dd6" rel="nofollow" title="我要投稿" target="_blank">我要投稿</a></li>-<li><a target="_blank" rel="nofollow" href="https://t5m44pq3f7.jiandaoyun.com/f/638ca8c69ad234000a79561f" title="广告合作">广告合作</a></li>-<li><a target="_blank" href="http://wpa.qq.com/msgrd?v=3&uin=76874919&site=qq&menu=yes">联系我们</a></li>-<li><a href="https://www.jb51.cc/disclaimers.html" title="免责声明">免责声明</a></li>-<li><a href="https://www.jb51.cc/sitemap/all/index.xml" title="网站地图" target="_blank">网站地图</a></li> </ul> <div>版权所有 © 2018编程之家<a href="https://beian.miit.gov.cn/" target="_blank" rel="nofollow">闽ICP备13020303号-8</a> </div> </div> <div class="col-sm-12 col-md-3 col-lg-3"><img src="https://www.jb51.cc/qrcode.jpg" width="90" alt="微信公众号搜索 “ 程序精选 ” ，选择关注！"> <div class="pull-right">微信公众号搜<span class="text-danger">"智元新知"</span>关注<br />微信扫一扫可直接关注哦！</div> </div> </div> </div> </footer> <script> (function () { var bp = document.createElement('script'); var curProtocol = window.location.protocol.split(':')[0]; if (curProtocol === 'https') { bp.src = 'https://zz.bdstatic.com/linksubmit/push.js'; } else { bp.src = 'http://push.zhanzhang.baidu.com/push.js'; } var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(bp, s); })(); </script> <script src="https://www.jb51.cc/js/count.js"></script> </body> </html>