如何解决使用scrapy-splash刮数据的问题
我正在尝试使用scrapy-splash从网站dermstore.com抓取一些数据
首先,我尝试访问dermstore.com中不同品牌的所有链接URL
size-allocate
我正在尝试将href网址抓取到不同的品牌并输出test.json,但仅输出空白json
我的蜘蛛码:
url = 'https://www.dermstore.com/all_Brands_100.htm'
控制台输出:
import scrapy
from scrapy_splash import SplashRequest
from scrapy.utils.response import open_in_browser
from scrapy.http.response.html import HtmlResponse
class SpiderdermSpider(scrapy.Spider):
name = 'spiderDerm'
script = """
function main(splash)
splash:init_cookies(splash.args.cookies)
assert(splash:go(splash.args.url))
splash:wait(0.5)
local element = splash:select('li.next a')
local bounds = element:bounds()
element:mouse_click{x=bounds.width/2,y=bounds.height/2}
assert(splash:wait(5.0))
return {
cookies = splash:get_cookies(),html = splash:html(),url = splash:url()
}
end
"""
url = 'https://www.dermstore.com/all_Brands_100.htm'
def start_requests(self):
yield SplashRequest(self.url,callback=self.parse,endpoint='render.html',args={"wait" : 0.5})
def parse(self,response):
#ht = HtmlResponse(url=response.url,body=response.body,encoding="utf-8",request=response.request)
#open_in_browser(ht)
#return None
for brand_url in response.css('a.col-xs-6::attr(href)'):
yield {
'url' : brand_url
}
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。