如何解决带有 CrawlerProcess 的 Scrapy 无限循环
我目前正在运行 Scrapy v2.5,我想运行无限循环。我的代码:
class main():
def bucle(self,array_spyder,process):
mongo = mongodb(setting)
for spider_name in array_spider:
process_init.crawl(spider_name,params={ "mongo": mongo,"spider_name": spider_name})
process.start()
process.stop()
mongo.close_mongo()
if __name__ == "__main__":
setting = get_project_settings()
while True:
process = CrawlerProcess(setting)
array_spider = process.spider_loader.list()
class_main = main()
class_main.bucle(array_spider,process)
但这导致了如下错误信息:
Traceback (most recent call last):
File "run_scrapy.py",line 92,in <module>
process.start()
File "/usr/local/lib/python3.8/dist-packages/scrapy/crawler.py",line 327,in start
reactor.run(installSignalHandlers=False) # blocking call
File "/usr/local/lib/python3.8/dist-packages/twisted/internet/base.py",line 1422,in run
self.startRunning(installSignalHandlers=installSignalHandlers)
File "/usr/local/lib/python3.8/dist-packages/twisted/internet/base.py",line 1404,in startRunning
ReactorBase.startRunning(cast(ReactorBase,self))
File "/usr/local/lib/python3.8/dist-packages/twisted/internet/base.py",line 843,in startRunning
raise error.ReactorNotRestartable()
twisted.internet.error.ReactorNotRestartable
有人可以帮我吗??
解决方法
AFAIK 没有简单的方法来重新启动蜘蛛,但有一个替代方案 - 蜘蛛永远不会关闭。为此,您可以使用 spider_idle
signal.
根据文档:
Sent when a spider has gone idle,which means the spider has no further:
* requests waiting to be downloaded
* requests scheduled
* items being processed in the item pipeline
您还可以在官方 documentation 中找到使用 Signals
的示例。
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。