如何解决IndexError-Python抓取脚本
我正在学习python。我正在尝试编写一个脚本,该脚本将从网页上某些表格的某些单元格中抓取关键数据,而忽略我不感兴趣的其他单元格。
到目前为止,我编写的脚本会收集表的前两行,但是随后会引发错误:
Traceback (most recent call last):
File "/home/Scripts/scraper.py",line 36,in <module>
mp3 = mp3_container[0]['href']
IndexError: list index out of range
这是我到目前为止编写的代码:
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
my_url = 'https://XXX'
# opening up connecting,grabbing page
uClient = uReq(my_url)
url_html = uClient.read()
uClient.close()
# html parser
url_soup = soup(url_html,"html.parser")
#download table
url_data = {}
url_table = url_soup.table
url_table_data = url_table.tbody.find_all("tr")
url_t_d = url_table_data[0]
#template for exacting and printing data
for url_t_d in url_table_data:
artist_container = url_t_d.find_all("td",{"class":"artist"})
artist = artist_container[0].text
title_container = url_t_d.find_all("td",{"class":"title"})
title = title_container[0].text
year_container = url_t_d.find_all("td",{"class":"year"})
year = year_container[0].text
mp3_container = url_t_d.find_all("a",{"title":"MP3 sample"})
mp3 = mp3_container[0]['href']
article_container = url_t_d.find_all("td",{"class":"articleListInfo"})
article_link =article_container[0].a['href']
print("Artist: " + artist)
print("Title: " + title)
print("year: " + year)
print("mp3: "+ mp3)
print("link: " + article_link)
有人可以建议我可能要去哪里哪里吗?谢谢
解决方法
我通过在try / except中的“ for”语句中包装每一行来解决了该问题,例如:
try:
mp3_container = url_t_d.find_all("a",{"title":"MP3 sample"})
mp3 = mp3_container[0]['href']
except:
mp3 = "none"
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。