如何解决BeautifulSoup-所有href链接似乎都未提取
我正在尝试提取类 ['地址'] 中的所有href链接。每次运行代码时,即使我知道应该有9个,也只能得到前5个。
我阅读了下面的各种线程,无数次地更改了我的代码,包括切换所有解析器(html.parser,html5lib,lxml,xml,lxml-xml),但似乎没有任何效果。第5次迭代后,是什么原因导致它停止了?我对python还是很陌生,所以如果这是我忽略的菜鸟错误,我深表歉意。任何帮助将不胜感激,即使是讽刺的回答:)
-
Beautiful Soup 4 find_all don't find links that Beautiful Soup 3 finds
-
Python 64 bit not storing as long of string as 32 bit python
我在下面的以下网页上使用了非常相似的代码,并且在刮除hrefs时没有遇到任何问题: https://www.walgreens.com/storelistings/storesbystate.jsp?requestType=locator https://www.walgreens.com/storelistings/storesbycity.jsp?requestType=locator&state=AK
我的下面的代码:
import requests
from bs4 import BeautifulSoup
local_rg = requests.get('https://www.walgreens.com/storelocator/find.jsp?requestType=locator&state=AK&city=ANCHORAGE&from=localSearch')
local_rg_content = local_rg.content
local_rg_content_src = BeautifulSoup(local_rg_content,'lxml')
for link in local_rg_content_src.find_all('div'):
local_class = str(link.get('class'))
if str("['address']") in str(local_class):
local_a = link.find_all('a')
for a_link in local_a:
local_href = str(a_link.get('href'))
print(local_href)
我的结果(前5个):
- / locator / walgreens-1470 + w + northern + lights + blvd-anchorage-ak-99503 / id = 15092
- / locator / walgreens-725 + e + northern + lights + blvd-anchorage-ak-99503 / id = 13656
- / locator / walgreens-4353 + lake + otis + parkway-anchorage-ak-99508 / id = 15653
- / locator / walgreens-7600 + debarr + rd-anchorage-ak-99504 / id = 12679
- / locator / walgreens-2197 + w + dimond + blvd-anchorage-ak-99515 / id = 12680
但应为9:
- / locator / walgreens-1470 + w + northern + lights + blvd-anchorage-ak-99503 / id = 15092
- / locator / walgreens-725 + e + northern + lights + blvd-anchorage-ak-99503 / id = 13656
- / locator / walgreens-4353 + lake + otis + parkway-anchorage-ak-99508 / id = 15653
- / locator / walgreens-7600 + debarr + rd-anchorage-ak-99504 / id = 12679
- / locator / walgreens-2197 + w + dimond + blvd-anchorage-ak-99515 / id = 12680
- / locator / walgreens-2550 + e + 88th + ave-anchorage-ak-99507 / id = 15654
- / locator / walgreens-12405 + brandon + st-anchorage-ak-99515 / id = 13449
- / locator / walgreens-12051 + old + glenn + hwy-eagle + river-ak-99577 / id = 15362
- / locator / walgreens-1721 + e + parks + hwy-wasilla-ak-99654 / id = 12681
解决方法
尝试使用selenium
而不是requests
来获取页面的源代码。这是您的操作方式:
from selenium import webdriver
driver = webdriver.Chrome()
driver.get('https://www.walgreens.com/storelocator/find.jsp?requestType=locator&state=AK&city=ANCHORAGE&from=localSearch')
local_rg_content = driver.page_source
driver.close()
local_rg_content_src = BeautifulSoup(local_rg_content,'lxml')
其余代码相同。这是完整的代码:
from bs4 import BeautifulSoup
from selenium import webdriver
driver = webdriver.Chrome()
driver.get('https://www.walgreens.com/storelocator/find.jsp?requestType=locator&state=AK&city=ANCHORAGE&from=localSearch')
local_rg_content = driver.page_source
driver.close()
local_rg_content_src = BeautifulSoup(local_rg_content,'lxml')
for link in local_rg_content_src.find_all('div'):
local_class = str(link.get('class'))
if str("['address']") in str(local_class):
local_a = link.find_all('a')
for a_link in local_a:
local_href = str(a_link.get('href'))
print(local_href)
输出:
/locator/walgreens-1470+w+northern+lights+blvd-anchorage-ak-99503/id=15092
/locator/walgreens-725+e+northern+lights+blvd-anchorage-ak-99503/id=13656
/locator/walgreens-4353+lake+otis+parkway-anchorage-ak-99508/id=15653
/locator/walgreens-7600+debarr+rd-anchorage-ak-99504/id=12679
/locator/walgreens-2197+w+dimond+blvd-anchorage-ak-99515/id=12680
/locator/walgreens-2550+e+88th+ave-anchorage-ak-99507/id=15654
/locator/walgreens-12405+brandon+st-anchorage-ak-99515/id=13449
/locator/walgreens-12051+old+glenn+hwy-eagle+river-ak-99577/id=15362
/locator/walgreens-1721+e+parks+hwy-wasilla-ak-99654/id=12681
,
该页面使用Ajax从外部URL加载商店信息。您可以使用requests
/ json
模块来加载它:
import re
import json
import requests
url = 'https://www.walgreens.com/storelocator/find.jsp?requestType=locator&state=AK&city=ANCHORAGE&from=localSearch'
ajax_url = 'https://www.walgreens.com/locator/v1/stores/search?requestor=search'
m = re.search(r'"lat":([\d.-]+),"lng":([\d.-]+)',requests.get(url).text)
params = {
'lat': m.group(1),'lng': m.group(2)
}
data = requests.post(ajax_url,json=params).json()
# uncomment this to print all data:
# print(json.dumps(data,indent=4))
for result in data['results']:
print(result['store']['address']['street'])
print('https://www.walgreens.com' + result['storeSeoUrl'])
print('-' * 80)
打印:
1470 W NORTHERN LIGHTS BLVD
https://www.walgreens.com/locator/walgreens-1470+w+northern+lights+blvd-anchorage-ak-99503/id=15092
--------------------------------------------------------------------------------
725 E NORTHERN LIGHTS BLVD
https://www.walgreens.com/locator/walgreens-725+e+northern+lights+blvd-anchorage-ak-99503/id=13656
--------------------------------------------------------------------------------
4353 LAKE OTIS PARKWAY
https://www.walgreens.com/locator/walgreens-4353+lake+otis+parkway-anchorage-ak-99508/id=15653
--------------------------------------------------------------------------------
7600 DEBARR RD
https://www.walgreens.com/locator/walgreens-7600+debarr+rd-anchorage-ak-99504/id=12679
--------------------------------------------------------------------------------
2197 W DIMOND BLVD
https://www.walgreens.com/locator/walgreens-2197+w+dimond+blvd-anchorage-ak-99515/id=12680
--------------------------------------------------------------------------------
2550 E 88TH AVE
https://www.walgreens.com/locator/walgreens-2550+e+88th+ave-anchorage-ak-99507/id=15654
--------------------------------------------------------------------------------
12405 BRANDON ST
https://www.walgreens.com/locator/walgreens-12405+brandon+st-anchorage-ak-99515/id=13449
--------------------------------------------------------------------------------
12051 OLD GLENN HWY
https://www.walgreens.com/locator/walgreens-12051+old+glenn+hwy-eagle+river-ak-99577/id=15362
--------------------------------------------------------------------------------
1721 E PARKS HWY
https://www.walgreens.com/locator/walgreens-1721+e+parks+hwy-wasilla-ak-99654/id=12681
--------------------------------------------------------------------------------
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。