如何解决我正在尝试从 Ballotpedia 中抓取美国国会议员的姓名
我正在尝试使用 Python 从 Ballotpedia 上的此页面 (https://ballotpedia.org/List_of_current_members_of_the_U.S._Congress) 中抓取美国国会议员的姓名。我正在使用的这段代码过去运行良好(就在上周)。现在,不是给我立法者的名字,而是给我页面标题:“,List_of_current_members_of_the_U.S._Congress”。
这是我的代码
import requests
from bs4 import BeautifulSoup
import pandas as pd
list = ['https://ballotpedia.org/List_of_current_members_of_the_U.S._Congress']
temp_dict = {}
for page in list:
r = requests.get(page)
soup = BeautifulSoup(r.content,'html.parser')
temp_dict[page.split('/')[-1]] = [item.text for item in
soup.select('table.wikitable.sortable.jquery-tablesorter')]
df = pd.DataFrame.from_dict(temp_dict,orient='index').transpose()
df.to_csv('3-New Congressmen.csv')
我认为问题出在第 13 行:
temp_dict[page.split('/')[-1]] = [item.text for item in
soup.select('table.wikitable.sortable.jquery-tablesorter')]
我尝试取出
table.wikitable.sortable.jquery-tablesorter
并将其替换为
bptable gray sortable tablesorter tablesorter-default tablesortera6303b5b2311e jquery-tablesorter
而且我还需要为美国众议院添加一条新线路,因为上述线路只会给参议员
bptable gray sortable tablesorter tablesorter-default tablesorter2e5ec79e370a5 jquery-tablesorter
你对我有什么建议吗?非常感谢!
解决方法
最好的等待只是使用这种样式 padding-left:10px;text-align:center;
将姓名添加到列表中,这是美国参议院成员姓名所独有的。
这应该可以解决问题:
import requests
import bs4 as bs
headers = {'user-agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML,like Gecko)'}
url="https://ballotpedia.org/List_of_current_members_of_the_U.S._Congress"
response = requests.get(url,headers=headers)
soup = bs.BeautifulSoup(response.text,'lxml')
tds = soup.findAll("td",{"style": "padding-left:10px;text-align:center;"})
names=[]
for td in tds:
names.append(td.getText())
,
如果您只对网站上的表格感兴趣,pandas 有一个内置函数 read_html()
(需要包 lxml)来抓取它并将其直接放入 DataFrame:
import pandas as pd
list = ['https://ballotpedia.org/List_of_current_members_of_the_U.S._Congress']
test = pd.read_html(list[0]) #gets all the tables from the url
print(test[3]) #get the different tables by index
输出:
Office Name Party Date assumed office
0 U.S. Senate Kansas Jerry Moran Republican January 5,2011
1 U.S. Senate Kansas Roger Marshall Republican January 3,2021
2 U.S. Senate Michigan Gary Peters Democratic January 6,2015
3 U.S. Senate Michigan Debbie Stabenow Democratic January 3,2001
4 U.S. Senate Virginia Tim Kaine Democratic January 3,2013
.. ... ... ... ...
95 U.S. Senate Missouri Josh Hawley Republican January 3,2019
96 U.S. Senate Pennsylvania Pat Toomey Republican January 5,2011
97 U.S. Senate Pennsylvania Bob Casey Jr. Democratic January 4,2007
98 U.S. Senate Utah Mike Lee Republican January 5,2011
99 U.S. Senate Utah Mitt Romney Republican January 3,2019
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。