微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

Scrapy CSS选择器返回一个空的价目表

如何解决Scrapy CSS选择器返回一个空的价目表

我正在用Scrapy抓取一个交互式网站,但是我似乎无法正确选择CSS选择器的价格。这是HTML的屏幕截图:



enter image description here

我已经尝试过的几个选择器:

price = response.css(".bui-price-display__value[aria-hidden='true']").css("::text").extract()

price = response.css(".prco-inline-block-maker-helper .bui-price-display__value").css("::text").extract()

price = response.css(".bui-price-display__value.prco-inline-block-maker-helper").css("::text").extract()

关于它什么可行的任何想法?

网站链接https://www.booking.com/searchresults.html?label=gen173nr-1DCAEoggI46AdIM1gEaJgCiAEBmAExuAEHyAEM2AED6AEB-AECiAIBqAIDuALD6_n6BcACAdICJGU2YTFmOTExLTJmZmMtNDZjOS1iYjk1LWY4OTM5OTFiZDA5ZdgCBOACAQ&sid=c3b17be33020b4a83d961a9fc14cf31d&sb=1&sb_lp=1&src=index&src_elem=sb&error_url=https%3A%2F%2Fwww.booking.com%2Findex.html%3Flabel%3Dgen173nr-1DCAEoggI46AdIM1gEaJgCiAEBmAExuAEHyAEM2AED6AEB-AECiAIBqAIDuALD6_n6BcACAdICJGU2YTFmOTExLTJmZmMtNDZjOS1iYjk1LWY4OTM5OTFiZDA5ZdgCBOACAQ%3Bsid%3Dc3b17be33020b4a83d961a9fc14cf31d%3Bsb_price_type%3Dtotal%26%3B&ss=Maribor&is_ski_area=0&ssne=Maribor&ssne_untouched=Maribor&dest_id=-88556&dest_type=city&checkin_year=2020&checkin_month=10&checkin_monthday=30&checkout_year=2020&checkout_month=11&checkout_monthday=4&group_adults=2&group_children=0&no_rooms=1&b_h4u_keep_filters=&from_sf=1

解决方法

当然,它返回一个空列表。并且您需要在与用户代理使用相同会话cookie的帮助下进行访问。 对于易碎的外壳,请提供cookie并设置用户代理,如下所示:

>> scrapy shell
>> from scrapy import Request
>> req = Request('https://www.booking.com/searchresults.html?label=gen173nr-1FCAEoggI46AdIM1gEaJgCiAEBmAExuAEHyAEM2AEB6AEB-AECiAIBqAIDuALD6_n6BcACAdICJGU2YTFmOTExLTJmZmMtNDZjOS1iYjk1LWY4OTM5OTFiZDA5ZdgCBeACAQ&sid=88ee1f1b53ea99d93e04dd0a9bd2e49f&tmpl=searchresults&checkin_month=10&checkin_monthday=30&checkin_year=2020&checkout_month=11&checkout_monthday=4&checkout_year=2020&class_interval=1&dest_id=-88556&dest_type=city&dtdisc=0&from_sf=1&group_adults=2&group_children=0&inac=0&index_postcard=0&label_click=undef&no_rooms=1&offset=0&postcard=0&raw_dest_type=city&room1=A%2CA&sb_price_type=total&shw_aparth=1&slp_r_match=0&src=index&src_elem=sb&srpvid=e78f974223200104&ss=Maribor&ss_all=0&ssb=empty&sshis=0&ssne=Maribor&ssne_untouched=Maribor&top_ufis=1&selected_currency=USD&changed_currency=1&top_currency=1&nflt=',headers={'upgrade-insecure-requests': 1,'cookie': '_pxhd=07c11db292e542c424e639bc65a4c6405c9dff060cd2bc061e31fc54e4f0c3df%3A61810c81-f608-11ea-b101-d53d6a14d275','User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML,like Gecko) Chrome/69.0.3497.92 Safari/537.36'})
>> fetch(req)
>> response.xpath('//*[@id="hotellist_inner"]/div[1]/div[2]/div[3]/div/div/div/div/div[2]/div[1]/div[2]/div/div[3]').get()

这将返回:

'<div class="bui-price-display__value prco-inline-block-maker-helper" aria-hidden="true" data-et-mouseenter="\ncustomGoal:AdeKbCcBUfQUaSHbZFVXOJUNQKFcFXZYCaJFSSZRe:2\n">\nUS
$1,225\n</div>'

使用从“网络”标签中获得的相同Cookie,并在代码中使用User-Agent,即可进行抓取。祝你好运。

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。