微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

从 http 请求响应中提取 JSON - Scrapy

如何解决从 http 请求响应中提取 JSON - Scrapy

我正在构建一个网络抓取工具,以从产品链接提取产品信息。

网址如下:https://scrapingclub.com/exercise/detail_header/

我通过 chrome Dev Tools 找到了产品详细信息的 HTTP 请求链接

这是我的代码

class quoteSpider(scrapy.Spider):
    name = 'Practice'
    
    start_urls = ['https://scrapingclub.com/exercise/detail_header/']

    def parse(self,response):
        yield scrapy.Request('https://scrapingclub.com/exercise/ajaxdetail_header/',callback = self.parse_detail,headers={'Accept': '*/*','Accept-Encoding': 'gzip,deflate,br','Accept-Language': 'es-ES,es;q=0.9,pt;q=0.8','Connection': 'keep-alive','Cookie': '__cfduid=da54d7e9c59cf35860825eabc96d7f1c41612805624; _ga=GA1.2.1229230175.1612805628; _gid=GA1.2.205529574.1613135874','Host': 'scrapingclub.com','Referer': 'https://scrapingclub.com/exercise/detail_header/','sec-ch-ua': '"Chromium";v="88","Google Chrome";v="88",";Not A Brand";v="99"','sec-ch-ua-mobile': '?0','Sec-Fetch-Dest': 'empty','sec-fetch-mode': 'cors','sec-fetch-site': 'same-origin','User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML,like Gecko) Chrome/88.0.4324.150 Safari/537.36','X-Requested-With': 'XMLHttpRequest'})

    def parse_detail(self,response):

        product = ProductClass()

        data = response


        # im still debugging so im not putting it into an item yet

        # data = json.loads(response.text)
        # product['product_name'] = data['title']
        # product['detail'] = data['description']
        # product['price'] = data['price']

        yield {
            'value' : data
        }

当我跑步时

scrapy crawl ProductSpider -O test.json

这是我的输出文件

[
{"value": "<TextResponse 200 https://scrapingclub.com/exercise/ajaxdetail_header/>"}
]

为什么不给我返回 JSON 内容

解决方法

更改标题数据以获得预期的输出

class quoteSpider(scrapy.Spider):
    name = 'Practice'

    start_urls = ['https://scrapingclub.com/exercise/detail_header/']

    def parse(self,response):
        headers = {
    'authority': 'scrapingclub.com','accept': '*/*','user-agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML,like Gecko) Chrome/88.0.4324.150 Safari/537.36','x-requested-with': 'XMLHttpRequest','sec-fetch-site': 'same-origin','sec-fetch-mode': 'cors','sec-fetch-dest': 'empty','referer': 'https://scrapingclub.com/exercise/detail_header/','accept-language': 'en-US,en;q=0.9','cookie': '__cfduid=d69d9664405f96c6477078a5c1fa78bb41613195439; _ga=GA1.2.523835360.1613195440; _gid=GA1.2.1763722170.1613195440',}

        yield scrapy.Request('https://scrapingclub.com/exercise/ajaxdetail_header/',callback = self.parse_detail,headers=headers)

    def parse_detail(self,response):

        product = {}

        data = response


        # im still debugging so im not putting it into an item yet

        data = json.loads(response.text)

        product['product_name'] = data['title']
        product['detail'] = data['description']
        product['price'] = data['price']
        yield product

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。

相关推荐


Selenium Web驱动程序和Java。元素在(x,y)点处不可单击。其他元素将获得点击?
Python-如何使用点“。” 访问字典成员?
Java 字符串是不可变的。到底是什么意思?
Java中的“ final”关键字如何工作?(我仍然可以修改对象。)
“loop:”在Java代码中。这是什么,为什么要编译?
java.lang.ClassNotFoundException:sun.jdbc.odbc.JdbcOdbcDriver发生异常。为什么?
这是用Java进行XML解析的最佳库。
Java的PriorityQueue的内置迭代器不会以任何特定顺序遍历数据结构。为什么?
如何在Java中聆听按键时移动图像。
Java“Program to an interface”。这是什么意思?