微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

使用 requests-html 从 Python 中的网页中提取特定元素

如何解决使用 requests-html 从 Python 中的网页中提取特定元素

说我在看这个网页

https://openpaymentsdata.cms.gov/search/physicians/by-name-and-location?firstname=robert&lastname=b&city=Palo_Alto

我想提取该医生个人资料的链接,但是当我尝试网络抓取时,即使使用 CSS 选择器也找不到该元素。

from requests_html import HTMLSession

firstname = 'robert'
lastname = 'b'
city = 'Palo_Alto'

url = 'https://openpaymentsdata.cms.gov/search/physicians/by-name-and-location?firstname='\
        + firstname + '&lastname=' + lastname + '&city=' + city

session = HTMLSession()

r = session.get(url)

sel = 'body > div.siteOuterWrapper > div.siteInnerWrapper > div.siteContentWrapper'
print(r.html.find(sel,first=True).text)

这一切都有效,直到我到达内容包装器,在那里我再也看不到任何元素。为什么是这样?有什么原因我看不到这个元素吗?我一开始以为是 Javascript 的原因,但是这个库声称拥有完整的 javascript 支持 https://requests-html.kennethreitz.org/

解决方法

您提到的网站从 API - this 获取数据。

您可以使用 GET 直接向该 API 发出 requests 请求并获取您的数据。

您可以使用 Chrome Devtools 找到 API 端点。

,

下面的 HTTP 请求应返回您要查找的数据。 (在浏览器中做 F12 > Network > XHR)

HTTP GET https://openpaymentsdata.cms.gov/resource/khdp-6xuy.json?%24select=%3Aid%2Cphysician_profile_id%2Cphysician_profile_last_name%2Cphysician_profile_middle_name%2Cphysician_profile_first_name%2Cphysician_profile_suffix%2Cphysician_profile_primary_specialty%2Cphysician_profile_address_line_1%2Cphysician_profile_address_line_2%2Cphysician_profile_city%2Cphysician_profile_state%2Cphysician_profile_province_name%2Cphysician_profile_country_name%2Cphysician_profile_zipcode%2Cphysician_profile_alternate_first_name1%2Cphysician_profile_alternate_last_name1%2Cphysician_profile_alternate_first_name2%2Cphysician_profile_alternate_last_name2%2Cphysician_profile_alternate_first_name3%2Cphysician_profile_alternate_last_name3%2Cphysician_profile_alternate_first_name4%2Cphysician_profile_alternate_last_name4%2Cphysician_profile_alternate_first_name5%2Cphysician_profile_alternate_last_name5%2Clocation&%24where=STARTS_WITH(UPPER(physician_profile_first_name)%2C%20%27ROBERT%27)%20AND%20STARTS_WITH(UPPER(physician_profile_last_name)%2C%20%27B%27)%20AND%20STARTS_WITH(UPPER(physician_profile_city)%2C%20%27PALO_ALTO%27)&%24order=physician_profile_last_name%20ASC%2Cphysician_profile_first_name%20ASC&%24limit=300

使用请求

print(requests.get('https://openpaymentsdata.cms.gov/resource/khdp-6xuy.json?%24select=%3Aid%2Cphysician_profile_id%2Cphysician_profile_last_name%2Cphysician_profile_middle_name%2Cphysician_profile_first_name%2Cphysician_profile_suffix%2Cphysician_profile_primary_specialty%2Cphysician_profile_address_line_1%2Cphysician_profile_address_line_2%2Cphysician_profile_city%2Cphysician_profile_state%2Cphysician_profile_province_name%2Cphysician_profile_country_name%2Cphysician_profile_zipcode%2Cphysician_profile_alternate_first_name1%2Cphysician_profile_alternate_last_name1%2Cphysician_profile_alternate_first_name2%2Cphysician_profile_alternate_last_name2%2Cphysician_profile_alternate_first_name3%2Cphysician_profile_alternate_last_name3%2Cphysician_profile_alternate_first_name4%2Cphysician_profile_alternate_last_name4%2Cphysician_profile_alternate_first_name5%2Cphysician_profile_alternate_last_name5%2Clocation&%24where=STARTS_WITH(UPPER(physician_profile_first_name)%2C%20%27ROBERT%27)%20AND%20STARTS_WITH(UPPER(physician_profile_last_name)%2C%20%27B%27)%20AND%20STARTS_WITH(UPPER(physician_profile_city)%2C%20%27PALO_ALTO%27)&%24order=physician_profile_last_name%20ASC%2Cphysician_profile_first_name%20ASC&%24limit=300').json())

输出

[{':id': 'row-9mfk-w6hd-ejup','physician_profile_id': '966387','physician_profile_last_name': 'BOCIAN','physician_profile_middle_name': 'C','physician_profile_first_name': 'ROBERT','physician_profile_primary_specialty': 'Allopathic & Osteopathic Physicians|Allergy & Immunology|Allergy','physician_profile_address_line_1': '795 EL CAMINO REAL','physician_profile_city': 'PALO ALTO','physician_profile_state': 'CA','physician_profile_country_name': 'UNITED STATES','physician_profile_zipcode': '94301-2302','physician_profile_alternate_first_name1': 'ROBERT','physician_profile_alternate_last_name1': 'BOCIAN'}]

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。