如何解决获取在访问页面时加载的xhr文档
我试图获得我们可以在the following site或其他等效图片上的照片下面看到的元素:
- https://www.nosetime.com/xiangshui/947895-oulong-xuecheng-atelier-cologne-orange.html
- https://www.nosetime.com/xiangshui/705357-pomelo-paradis.html
- https://www.nosetime.com/xiangshui/592260-cl-mentine-california.html
- https://www.nosetime.com/xiangshui/612353-oulong-atelier-cologne-trefle.html
- https://www.nosetime.com/xiangshui/911317-oulong-nimingmeigui-atelier-cologne.html
但是我无法从源代码中获得它。应该使用javascript脚本动态下载。实际上,它似乎在xhr文档中:
那么我如何获得访问页面时下载的xhr文档?
我尝试过:
url = "https://www.nosetime.com/xiangshui/350870-oulong-atelier-cologne-oolang-infini.html"
r = requests.post(url,headers=headers)
data = r.json()
print(data)
让我热血沸腾:
---------------------------------------------------------------------------
JSONDecodeError Traceback (most recent call last)
<ipython-input-8-e72156ddb336> in <module>()
2
3 r = requests.post(url,headers=headers)
----> 4 data = r.json()
5
6 print(data)
3 frames
/usr/lib/python3.6/json/decoder.py in raw_decode(self,s,idx)
355 obj,end = self.scan_once(s,idx)
356 except StopIteration as err:
--> 357 raise JSONDecodeError("Expecting value",err.value) from None
358 return obj,end
JSONDecodeError: Expecting value: line 1 column 1 (char 0)
解决方法
只需添加正确的标题就可以了。
import requests
headers = {
"referer": "https://www.nosetime.com/xiangshui/350870-oulong-atelier-cologne-oolang-infini.html","user-agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML,like Gecko) Chrome/86.0.4240.111 Safari/537.36",}
response = requests.get("https://www.nosetime.com/app/item.php?id=350870",headers=headers).json()
print(response["id"],response["isscore"],response["brandid"])
由于某种原因,我无法粘贴整个JSON
输出,因为SO
认为这是垃圾邮件。无论如何,这应该为您带来JSON
的答复。
此打印:
350870 8.6 10091761
编辑:
如果您有更多产品,则只需查看产品URL,然后从JSON
中提取所需的内容。例如,
import requests
product_urls = [
"https://www.nosetime.com/xiangshui/947895-oulong-xuecheng-atelier-cologne-orange.html","https://www.nosetime.com/xiangshui/705357-pomelo-paradis.html","https://www.nosetime.com/xiangshui/592260-cl-mentine-california.html","https://www.nosetime.com/xiangshui/612353-oulong-atelier-cologne-trefle.html","https://www.nosetime.com/xiangshui/911317-oulong-nimingmeigui-atelier-cologne.html",]
for product_url in product_urls:
headers = {
"referer": product_url,}
product_id = product_url.split("/")[-1].split("-")[0]
response = requests.get(
f"https://www.nosetime.com/app/item.php?id={product_id}",headers=headers,).json()
print(f"Product name: {response['enname']} | Rating: {response['isscore']}")
输出:
Product name: Atelier Cologne Orange Sanguine,2010 | Rating: 8.9
Product name: Atelier Cologne Pomelo Paradis,2015 | Rating: 8.8
Product name: Atelier Cologne Clémentine California,2016 | Rating: 8.6
Product name: Atelier Cologne Trefle Pur,2010 | Rating: 8.6
Product name: Atelier Cologne Rose Anonyme,2012 | Rating: 7.7
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。