努力用硒来废桌子

如何解决努力用硒来废桌子

因此,我希望能够抓取此link中出现的表格。

为了报废,我决定使用硒。

我的第一次尝试是:

driver = webdriver.Chrome(ChromeDriverManager().install())
driver.get(url)
html_source = self.driver.page_source
self.driver.quit()
BeautifulSoup(html_source,"html5lib")
table = soup.find('table',{'class': 'heavy-table ncpulse-fav-table ncpulse-sortable compressed-table'})
df = pd.read_html(str(table),flavor='html5lib',header=0,thousands='.',decimal=',')

但是输出错误

'no tables found'

然后我尝试使用Expected_conditions类,因为当我在SO中查找时,也许“页面源在子元素完全呈现之前就被拔出了”。

因此,我尝试了这样的事情:

driver.get(route)
element_present = expected_conditions.presence_of_element_located(
    (By.CLASS_NAME,'heavy-table ncpulse-fav-table ncpulse-sortable compressed-table'))
WebDriverWait(driver,20).until(element_present)
html_source = driver.page_source 
driver.quit()

但是这一次它输出:

selenium.common.exceptions.TimeoutException: Message

因此,我的问题是:如何获得所需的输出?使用expected_conditions类有什么错?背后的问题/前端技术是什么使得它很难被淘汰?

解决方法

复合类名不是由CLASSNAME选择器处理的,但是可以通过CSS选择器或xpath来获取。 CSS_SELECTOR比XPATH更有效

element_present = expected_conditions.presence_of_element_located(
            (By.CSS_SELECTOR,"table[class='heavy-table ncpulse-fav-table ncpulse-sortable compressed-table']"))
    #or by xpath
element_present = expected_conditions.presence_of_element_located(
            (By.XPATH,"//table[@class='heavy-table ncpulse-fav-table ncpulse-sortable compressed-table']"))
,

要使用AngularSelenium而不是<table>presence_of_element_located()是基于的元素中提取表中的内容,您必须引入{{ 3}}用作visibility_of_element_located(),则可以使用以下任一WebDriverWait

  • 使用CSS_SELECTOR

    print(WebDriverWait(driver,20).until(EC.visibility_of_element_located((By.CSS_SELECTOR,"table.heavy-table.ncpulse-fav-table.ncpulse-sortable.compressed-table"))).text)
    
  • 使用XPATH

    print(WebDriverWait(driver,20).until(EC.visibility_of_element_located((By.XPATH,"//table[@class='heavy-table ncpulse-fav-table ncpulse-sortable compressed-table']"))).text)
    
  • 注意:您必须添加以下导入:

    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    
  • 控制台输出:

    AKTIE +/- +/-% SENESTE ÅTD% BUD UDBUD VOLUMEN OMSÆTNING MARKEDSVÆRDI TID
    Abn Amro Bank N.V. -0,32 -4,08% 7,48 -53,90% - - 7,9 mio 59,0 mio - 21:09
    Adyen 81,00 5,62% 1523,00 108% - - 954 082 1,5 mia - 21:09
    Aegon -0,08 -3,49% 2,16 -45,47% - - 17,4 mio 37,5 mio - 21:05
    Ahold Del 0,25 0,98% 25,65 19,74% - - 8,0 mio 204,1 mio - 21:05
    Akzo Nobel 0,14 0,16% 85,86 -3,16% - - 1,1 mio 90,6 mio - 21:06
    Arcelormittal Sa 0,08 0,66% 11,53 -26,26% - - 11,9 mio 137,3 mio - 21:09
    Asm International 0,35 0,29% 119,10 21,23% - - 403 117 48,0 mio - 21:07
    Asml Holding 1,50 0,49% 308,45 17,56% - - 2,3 mio 712,7 mio - 21:05
    Asr Nederland -0,22 -0,73% 29,76 -4,97% - - 740 781 22,0 mio - 21:05
    Dsm Kon 2,25 1,66% 138,20 21,52% - - 680 867 94,1 mio - 21:09
    Galapagos -1,45 -1,22% 117,70 -36,89% - - 475 793 56,0 mio - 21:05
    Heineken 0,74 0,94% 79,10 -15,50% - - 1,1 mio 88,0 mio - 21:05
    Imcd 1,85 1,80% 104,85 36,23% - - 922 391 96,7 mio - 21:05
    Ing Groep N.V. -0,19 -2,80% 6,60 -38,24% - - 43,4 mio 286,2 mio - 21:08
    Just Eat Takeaway 0,09% 91,70 11,56% - - 1,1 mio 100,2 mio - 21:09
    Kpn Kon -0,03 -1,54% 2,11 -15,04% - - 21,4 mio 45,1 mio - 21:05
    Nn Group -0,35 -1,06% 32,80 3,82% - - 2,4 mio 79,6 mio - 21:05
    Philips Kon -0,08 -0,20% 39,42 -9,42% - - 5,2 mio 205,9 mio - 21:05
    Prosus -1,52 -1,89% 78,74 18,35% - - 15,0 mio 1,2 mia - 21:09
    Randstad Nv -0,98 -2,09% 45,93 -15,63% - - 698 496 32,1 mio - 21:05
    Relx 0,00 0,03% 19,64 -10,24% - - 1,9 mio 36,6 mio - 21:06
    Royal Dutch Shella -0,24 -2,07% 11,45 -54,58% - - 21,1 mio 241,2 mio - 21:07
    Unibail-Rodamco-We -3,79 -10,53% 32,20 -75,02% - - 6,6 mio 213,2 mio - 21:07
    Unilever -1,04 -2,00% 50,98 2,00% - - 8,2 mio 417,5 mio - 21:08
    Wolters Kluwer -0,04 -0,05% 72,88 14,19% - - 803 644 58,6 mio - 21:05
    
,

对于多个类名,请使用最右边。

element_present = WebDriverWait(driver,20).until(EC.presence_of_element_located((By.CLASS_NAME,'compressed-table')))
print(element_present.text)

输出

AKTIE +/- +/-% SENESTE ÅTD% VOLUMEN OMSÆTNING MARKEDSVÆRDI
Abn Amro Bank N.V. -0,90% 7,0 mio -
Adyen 81,00 108% 954 082 1,5 mia -
Aegon -0,47% 17,5 mio -
Ahold Del 0,74% 8,1 mio -
Akzo Nobel 0,16% 1,6 mio -
Arcelormittal Sa 0,26% 11,3 mio -

转换为英语

options = Options()
prefs = {
  "translate_whitelists": {"da":"en"},"translate":{"enabled":"true"}
}
options.add_experimental_option("prefs",prefs)
driver = webdriver.Chrome(ChromeDriverManager().install(),options=options)


SHARES +/- + / -% MOST RECENT ÅTD% VOLUME TURNOVER MARKET VALUE
Abn Amro Bank NV -0.32 -4.08% 7.48 -53.90% 7.9 million 59.0 million -
Adyen 81.00 5.62% 1523.00 108% 954 082 1.5 billion -
Aegon -0.08 -3.49% 2.16 -45.47% 17.4 million 37.5 million -
Ahold Del 0.25 0.98% 25.65 19.74% 8.0 million 204.1 million -
Akzo Nobel 0.14 0.16% 85.86 -3.16% 1.1 million 90.6 million -

导入

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait 
from selenium.webdriver.support import expected_conditions as EC
,

要获取表信息WebDriverWait()和visibility_of_element_located()并跟随css selector

driver.get("https://borsen.dk/investor/kurser/eur-aktier/?filter=aex25")
WebDriverWait(driver,".compressed-table")))
html_source =driver.page_source
driver.quit()
soup=BeautifulSoup(html_source,"html5lib")
table=soup.select_one(".compressed-table")
df = pd.read_html(str(table),flavor='html5lib',header=0,thousands='.',decimal=',')
print(df[0])

输出:

  Unnamed: 0               Aktie    +/-  ... Markedsværdi  Sektor    Tid
0          NaN  Abn Amro Bank N.V.  -0.32  ...            -     NaN  16:39
1          NaN               Adyen  81.00  ...            -     NaN  16:39
2          NaN               Aegon  -0.08  ...            -     NaN  16:35
3          NaN           Ahold Del   0.25  ...            -     NaN  16:35
4          NaN          Akzo Nobel   0.14  ...            -     NaN  16:36
5          NaN    Arcelormittal Sa   0.08  ...            -     NaN  16:39
6          NaN   Asm International   0.35  ...            -     NaN  16:37
7          NaN        Asml Holding   1.50  ...            -     NaN  16:35
8          NaN       Asr Nederland  -0.22  ...            -     NaN  16:35
9          NaN             Dsm Kon   2.25  ...            -     NaN  16:39
10         NaN           Galapagos  -1.45  ...            -     NaN  16:35
11         NaN            Heineken   0.74  ...            -     NaN  16:35
12         NaN                Imcd   1.85  ...            -     NaN  16:35
13         NaN      Ing Groep N.V.  -0.19  ...            -     NaN  16:38
14         NaN   Just Eat Takeaway   0.08  ...            -     NaN  16:39
15         NaN             Kpn Kon  -0.03  ...            -     NaN  16:35
16         NaN            Nn Group  -0.35  ...            -     NaN  16:35
17         NaN         Philips Kon  -0.08  ...            -     NaN  16:35
18         NaN              Prosus  -1.52  ...            -     NaN  16:39
19         NaN         Randstad Nv  -0.98  ...            -     NaN  16:35
20         NaN                Relx   0.00  ...            -     NaN  16:36
21         NaN  Royal Dutch Shella  -0.24  ...            -     NaN  16:37
22         NaN  Unibail-Rodamco-We  -3.79  ...            -     NaN  16:37
23         NaN            Unilever  -1.04  ...            -     NaN  16:38
24         NaN      Wolters Kluwer  -0.04  ...            -     NaN  16:35

[25 rows x 13 columns]

您也可以使用find()

driver.get("https://borsen.dk/investor/kurser/eur-aktier/?filter=aex25")
WebDriverWait(driver,"html5lib")
df=pd.read_html(str(soup.find('table',class_='heavy-table ncpulse-fav-table ncpulse-sortable compressed-table')))[0]
print(df)

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。

相关推荐


使用本地python环境可以成功执行 import pandas as pd import matplotlib.pyplot as plt # 设置字体 plt.rcParams[&#39;font.sans-serif&#39;] = [&#39;SimHei&#39;] # 能正确显示负号 p
错误1:Request method ‘DELETE‘ not supported 错误还原:controller层有一个接口,访问该接口时报错:Request method ‘DELETE‘ not supported 错误原因:没有接收到前端传入的参数,修改为如下 参考 错误2:cannot r
错误1:启动docker镜像时报错:Error response from daemon: driver failed programming external connectivity on endpoint quirky_allen 解决方法:重启docker -&gt; systemctl r
错误1:private field ‘xxx‘ is never assigned 按Altʾnter快捷键,选择第2项 参考:https://blog.csdn.net/shi_hong_fei_hei/article/details/88814070 错误2:启动时报错,不能找到主启动类 #
报错如下,通过源不能下载,最后警告pip需升级版本 Requirement already satisfied: pip in c:\users\ychen\appdata\local\programs\python\python310\lib\site-packages (22.0.4) Coll
错误1:maven打包报错 错误还原:使用maven打包项目时报错如下 [ERROR] Failed to execute goal org.apache.maven.plugins:maven-resources-plugin:3.2.0:resources (default-resources)
错误1:服务调用时报错 服务消费者模块assess通过openFeign调用服务提供者模块hires 如下为服务提供者模块hires的控制层接口 @RestController @RequestMapping(&quot;/hires&quot;) public class FeignControl
错误1:运行项目后报如下错误 解决方案 报错2:Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.8.1:compile (default-compile) on project sb 解决方案:在pom.
参考 错误原因 过滤器或拦截器在生效时,redisTemplate还没有注入 解决方案:在注入容器时就生效 @Component //项目运行时就注入Spring容器 public class RedisBean { @Resource private RedisTemplate&lt;String
使用vite构建项目报错 C:\Users\ychen\work&gt;npm init @vitejs/app @vitejs/create-app is deprecated, use npm init vite instead C:\Users\ychen\AppData\Local\npm-
参考1 参考2 解决方案 # 点击安装源 协议选择 http:// 路径填写 mirrors.aliyun.com/centos/8.3.2011/BaseOS/x86_64/os URL类型 软件库URL 其他路径 # 版本 7 mirrors.aliyun.com/centos/7/os/x86
报错1 [root@slave1 data_mocker]# kafka-console-consumer.sh --bootstrap-server slave1:9092 --topic topic_db [2023-12-19 18:31:12,770] WARN [Consumer clie
错误1 # 重写数据 hive (edu)&gt; insert overwrite table dwd_trade_cart_add_inc &gt; select data.id, &gt; data.user_id, &gt; data.course_id, &gt; date_format(
错误1 hive (edu)&gt; insert into huanhuan values(1,&#39;haoge&#39;); Query ID = root_20240110071417_fe1517ad-3607-41f4-bdcf-d00b98ac443e Total jobs = 1
报错1:执行到如下就不执行了,没有显示Successfully registered new MBean. [root@slave1 bin]# /usr/local/software/flume-1.9.0/bin/flume-ng agent -n a1 -c /usr/local/softwa
虚拟及没有启动任何服务器查看jps会显示jps,如果没有显示任何东西 [root@slave2 ~]# jps 9647 Jps 解决方案 # 进入/tmp查看 [root@slave1 dfs]# cd /tmp [root@slave1 tmp]# ll 总用量 48 drwxr-xr-x. 2
报错1 hive&gt; show databases; OK Failed with exception java.io.IOException:java.lang.RuntimeException: Error in configuring object Time taken: 0.474 se
报错1 [root@localhost ~]# vim -bash: vim: 未找到命令 安装vim yum -y install vim* # 查看是否安装成功 [root@hadoop01 hadoop]# rpm -qa |grep vim vim-X11-7.4.629-8.el7_9.x
修改hadoop配置 vi /usr/local/software/hadoop-2.9.2/etc/hadoop/yarn-site.xml # 添加如下 &lt;configuration&gt; &lt;property&gt; &lt;name&gt;yarn.nodemanager.res