微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

当我尝试使用飞溅抓取内容时,我得到一个空列表,为什么?

如何解决当我尝试使用飞溅抓取内容时,我得到一个空列表,为什么?

这是我要抓取的网站 https://people.sap.com/tim.sheppard

特别是我试图抓取第一篇文章,它在检查窗口中有这个位置:

<div class="dm-content-item__text">We have migrated an application from PB 7 to PB 12.5.2 build 5006. After the migration we're having problems with some computed fields in datawindows. The app has many datawindows that have computed fields which include a date() function using Syntax...</div>

我的蜘蛛如下

import scrapy
from scrapy_splash import SplashRequest

class RedditSpider(scrapy.Spider):
    name = 'quotes'
    allowed_domains = ['people.sap.com']
    start_urls = ['http://people.sap.com/tim.sheppard']

    def start_requests(self):
        for url in self.start_urls:
            yield SplashRequest(url=url,callback=self.parse,endpoint='render.html')

    def parse(self,response):
        quote = response.xpath('//*[@class="dm-content-item__text"]/text()').extract()
        yield {"quote": quote}

这是我得到的回应

2021-06-10 17:33:22 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://people.sap.com/tim.sheppard via http://localhost:8050/render.html> (referer: None)
2021-06-10 17:33:22 [scrapy.core.scraper] DEBUG: Scraped from <200 http://people.sap.com/tim.sheppard>
{'quote': []}
2021-06-10 17:33:22 [scrapy.core.engine] INFO: Closing spider (finished)
2021-06-10 17:33:22 [scrapy.statscollectors] INFO: Dumping Scrapy stats:

我不明白我做错了什么...

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。