BeautifulSoup-所有href链接似乎都未提取

如何解决BeautifulSoup-所有href链接似乎都未提取

我正在尝试提取类 ['地址'] 中的所有href链接。每次运行代码时,即使我知道应该有9个,也只能得到前5个。

网页: https://www.walgreens.com/storelocator/find.jsp?requestType=locator&state=AK&city=ANCHORAGE&from=localSearch

我阅读了下面的各种线程,无数次地更改了我的代码,包括切换所有解析器(html.parser,html5lib,lxml,xml,lxml-xml),但似乎没有任何效果。第5次迭代后,是什么原因导致它停止了?我对python还是很陌生,所以如果这是我忽略的菜鸟错误,我深表歉意。任何帮助将不胜感激,即使是讽刺的回答:)

我在下面的以下网页上使用了非常相似的代码,并且在刮除hrefs时没有遇到任何问题: https://www.walgreens.com/storelistings/storesbystate.jsp?requestType=locator https://www.walgreens.com/storelistings/storesbycity.jsp?requestType=locator&state=AK

我的下面的代码:

import requests
from bs4 import BeautifulSoup


local_rg = requests.get('https://www.walgreens.com/storelocator/find.jsp?requestType=locator&state=AK&city=ANCHORAGE&from=localSearch')

local_rg_content = local_rg.content
local_rg_content_src = BeautifulSoup(local_rg_content,'lxml')

for link in local_rg_content_src.find_all('div'):
    local_class = str(link.get('class'))
    if str("['address']") in str(local_class):
        local_a = link.find_all('a')
        for a_link in local_a:
            local_href = str(a_link.get('href'))
            print(local_href)

我的结果(前5个):

  1. / locator / walgreens-1470 + w + northern + lights + blvd-anchorage-ak-99503 / id = 15092
  2. / locator / walgreens-725 + e + northern + lights + blvd-anchorage-ak-99503 / id = 13656
  3. / locator / walgreens-4353 + lake + otis + parkway-anchorage-ak-99508 / id = 15653
  4. / locator / walgreens-7600 + debarr + rd-anchorage-ak-99504 / id = 12679
  5. / locator / walgreens-2197 + w + dimond + blvd-anchorage-ak-99515 / id = 12680

但应为9:

  1. / locator / walgreens-1470 + w + northern + lights + blvd-anchorage-ak-99503 / id = 15092
  2. / locator / walgreens-725 + e + northern + lights + blvd-anchorage-ak-99503 / id = 13656
  3. / locator / walgreens-4353 + lake + otis + parkway-anchorage-ak-99508 / id = 15653
  4. / locator / walgreens-7600 + debarr + rd-anchorage-ak-99504 / id = 12679
  5. / locator / walgreens-2197 + w + dimond + blvd-anchorage-ak-99515 / id = 12680
  6. / locator / walgreens-2550 + e + 88th + ave-anchorage-ak-99507 / id = 15654
  7. / locator / walgreens-12405 + brandon + st-anchorage-ak-99515 / id = 13449
  8. / locator / walgreens-12051 + old + glenn + hwy-eagle + river-ak-99577 / id = 15362
  9. / locator / walgreens-1721 + e + parks + hwy-wasilla-ak-99654 / id = 12681

解决方法

尝试使用selenium而不是requests来获取页面的源代码。这是您的操作方式:

from selenium import webdriver

driver = webdriver.Chrome()
driver.get('https://www.walgreens.com/storelocator/find.jsp?requestType=locator&state=AK&city=ANCHORAGE&from=localSearch')

local_rg_content = driver.page_source
driver.close()
local_rg_content_src = BeautifulSoup(local_rg_content,'lxml')

其余代码相同。这是完整的代码:

from bs4 import BeautifulSoup
from selenium import webdriver

driver = webdriver.Chrome()
driver.get('https://www.walgreens.com/storelocator/find.jsp?requestType=locator&state=AK&city=ANCHORAGE&from=localSearch')

local_rg_content = driver.page_source
driver.close()
local_rg_content_src = BeautifulSoup(local_rg_content,'lxml')

for link in local_rg_content_src.find_all('div'):
    local_class = str(link.get('class'))
    if str("['address']") in str(local_class):
        local_a = link.find_all('a')
        for a_link in local_a:
            local_href = str(a_link.get('href'))
            print(local_href)

输出:

/locator/walgreens-1470+w+northern+lights+blvd-anchorage-ak-99503/id=15092
/locator/walgreens-725+e+northern+lights+blvd-anchorage-ak-99503/id=13656
/locator/walgreens-4353+lake+otis+parkway-anchorage-ak-99508/id=15653
/locator/walgreens-7600+debarr+rd-anchorage-ak-99504/id=12679
/locator/walgreens-2197+w+dimond+blvd-anchorage-ak-99515/id=12680
/locator/walgreens-2550+e+88th+ave-anchorage-ak-99507/id=15654
/locator/walgreens-12405+brandon+st-anchorage-ak-99515/id=13449
/locator/walgreens-12051+old+glenn+hwy-eagle+river-ak-99577/id=15362
/locator/walgreens-1721+e+parks+hwy-wasilla-ak-99654/id=12681
,

该页面使用Ajax从外部URL加载商店信息。您可以使用requests / json模块来加载它:

import re
import json
import requests


url = 'https://www.walgreens.com/storelocator/find.jsp?requestType=locator&state=AK&city=ANCHORAGE&from=localSearch'
ajax_url = 'https://www.walgreens.com/locator/v1/stores/search?requestor=search'
m = re.search(r'"lat":([\d.-]+),"lng":([\d.-]+)',requests.get(url).text)

params = {
    'lat': m.group(1),'lng': m.group(2)
}

data = requests.post(ajax_url,json=params).json()

# uncomment this to print all data:
# print(json.dumps(data,indent=4))

for result in data['results']:
    print(result['store']['address']['street'])
    print('https://www.walgreens.com' + result['storeSeoUrl'])
    print('-' * 80)

打印:

1470 W NORTHERN LIGHTS BLVD
https://www.walgreens.com/locator/walgreens-1470+w+northern+lights+blvd-anchorage-ak-99503/id=15092
--------------------------------------------------------------------------------
725 E NORTHERN LIGHTS BLVD
https://www.walgreens.com/locator/walgreens-725+e+northern+lights+blvd-anchorage-ak-99503/id=13656
--------------------------------------------------------------------------------
4353 LAKE OTIS PARKWAY
https://www.walgreens.com/locator/walgreens-4353+lake+otis+parkway-anchorage-ak-99508/id=15653
--------------------------------------------------------------------------------
7600 DEBARR RD
https://www.walgreens.com/locator/walgreens-7600+debarr+rd-anchorage-ak-99504/id=12679
--------------------------------------------------------------------------------
2197 W DIMOND BLVD
https://www.walgreens.com/locator/walgreens-2197+w+dimond+blvd-anchorage-ak-99515/id=12680
--------------------------------------------------------------------------------
2550 E 88TH AVE
https://www.walgreens.com/locator/walgreens-2550+e+88th+ave-anchorage-ak-99507/id=15654
--------------------------------------------------------------------------------
12405 BRANDON ST
https://www.walgreens.com/locator/walgreens-12405+brandon+st-anchorage-ak-99515/id=13449
--------------------------------------------------------------------------------
12051 OLD GLENN HWY
https://www.walgreens.com/locator/walgreens-12051+old+glenn+hwy-eagle+river-ak-99577/id=15362
--------------------------------------------------------------------------------
1721 E PARKS HWY
https://www.walgreens.com/locator/walgreens-1721+e+parks+hwy-wasilla-ak-99654/id=12681
--------------------------------------------------------------------------------

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。

相关推荐


使用本地python环境可以成功执行 import pandas as pd import matplotlib.pyplot as plt # 设置字体 plt.rcParams['font.sans-serif'] = ['SimHei'] # 能正确显示负号 p
错误1:Request method ‘DELETE‘ not supported 错误还原:controller层有一个接口,访问该接口时报错:Request method ‘DELETE‘ not supported 错误原因:没有接收到前端传入的参数,修改为如下 参考 错误2:cannot r
错误1:启动docker镜像时报错:Error response from daemon: driver failed programming external connectivity on endpoint quirky_allen 解决方法:重启docker -> systemctl r
错误1:private field ‘xxx‘ is never assigned 按Altʾnter快捷键,选择第2项 参考:https://blog.csdn.net/shi_hong_fei_hei/article/details/88814070 错误2:启动时报错,不能找到主启动类 #
报错如下,通过源不能下载,最后警告pip需升级版本 Requirement already satisfied: pip in c:\users\ychen\appdata\local\programs\python\python310\lib\site-packages (22.0.4) Coll
错误1:maven打包报错 错误还原:使用maven打包项目时报错如下 [ERROR] Failed to execute goal org.apache.maven.plugins:maven-resources-plugin:3.2.0:resources (default-resources)
错误1:服务调用时报错 服务消费者模块assess通过openFeign调用服务提供者模块hires 如下为服务提供者模块hires的控制层接口 @RestController @RequestMapping("/hires") public class FeignControl
错误1:运行项目后报如下错误 解决方案 报错2:Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.8.1:compile (default-compile) on project sb 解决方案:在pom.
参考 错误原因 过滤器或拦截器在生效时,redisTemplate还没有注入 解决方案:在注入容器时就生效 @Component //项目运行时就注入Spring容器 public class RedisBean { @Resource private RedisTemplate<String
使用vite构建项目报错 C:\Users\ychen\work>npm init @vitejs/app @vitejs/create-app is deprecated, use npm init vite instead C:\Users\ychen\AppData\Local\npm-
参考1 参考2 解决方案 # 点击安装源 协议选择 http:// 路径填写 mirrors.aliyun.com/centos/8.3.2011/BaseOS/x86_64/os URL类型 软件库URL 其他路径 # 版本 7 mirrors.aliyun.com/centos/7/os/x86
报错1 [root@slave1 data_mocker]# kafka-console-consumer.sh --bootstrap-server slave1:9092 --topic topic_db [2023-12-19 18:31:12,770] WARN [Consumer clie
错误1 # 重写数据 hive (edu)> insert overwrite table dwd_trade_cart_add_inc > select data.id, > data.user_id, > data.course_id, > date_format(
错误1 hive (edu)> insert into huanhuan values(1,'haoge'); Query ID = root_20240110071417_fe1517ad-3607-41f4-bdcf-d00b98ac443e Total jobs = 1
报错1:执行到如下就不执行了,没有显示Successfully registered new MBean. [root@slave1 bin]# /usr/local/software/flume-1.9.0/bin/flume-ng agent -n a1 -c /usr/local/softwa
虚拟及没有启动任何服务器查看jps会显示jps,如果没有显示任何东西 [root@slave2 ~]# jps 9647 Jps 解决方案 # 进入/tmp查看 [root@slave1 dfs]# cd /tmp [root@slave1 tmp]# ll 总用量 48 drwxr-xr-x. 2
报错1 hive> show databases; OK Failed with exception java.io.IOException:java.lang.RuntimeException: Error in configuring object Time taken: 0.474 se
报错1 [root@localhost ~]# vim -bash: vim: 未找到命令 安装vim yum -y install vim* # 查看是否安装成功 [root@hadoop01 hadoop]# rpm -qa |grep vim vim-X11-7.4.629-8.el7_9.x
修改hadoop配置 vi /usr/local/software/hadoop-2.9.2/etc/hadoop/yarn-site.xml # 添加如下 <configuration> <property> <name>yarn.nodemanager.res