如何解决如果找不到元素或硒中发生timeoutexception时如何跳至下一个url等待功能
我正在尝试从气象站刮擦每日观测表。我有以下用于获取特定表的代码:
#Iterate request to each weather station and date
for station,month,year in product(weather_station,year):
areacode = weather_station[station]['areacode']
#Set link according to data need
driver.get('https://www.wunderground.com/history/monthly/'+countrycode+'/'+station+'/'+areacode+'/date/'+str(year)+'-'+str(month))
#Wait webpage to fully load necessary tables
wait = webdriverwait(driver,15)
#Update xpath incase webpage html format changes
xpath_html_loc='//*[@id="inner-content"]/div[2]/div[1]/div[5]/div[1]/div/lib-city-history-observation/div/div[2]/table'
tables = wait.until(EC.presence_of_all_elements_located((By.XPATH,xpath_html_loc)))
#Save only the necessary table from loaded webpage
for table in tables:
histo_table = pd.read_html(table.get_attribute('outerHTML'))
histo_weather = histo_table[2].fillna('')
print("Weather observations for ",str(month),"-",str(year)," from station",station,"is ready \n")
此代码遍历网站上所有必要的页面,并且在获取所需的特定表时工作正常,但是当该页面中不存在该表或链接不可用时,它将返回此错误:timeoutexception >
我阅读了有关try和except选项的信息,但在这种情况下似乎无法使它起作用。您能建议一个更好的解决方案吗?下面带有try和except的代码仍会输出timeoutexception错误。如果表元素不存在或链接不可用,我希望有一个代码可以跳过当前URL并转到下一个URL(即,返回到for循环的开头以迭代下一个URL)。
try:
#Set link according to data need
driver.get('https://www.wunderground.com/history/monthly/'+countrycode+'/'+station+'/'+areacode+'/date/'+str(year)+'-'+str(month))
#Wait webpage to fully load necessary tables
wait = webdriverwait(driver,15)
#Update xpath incase webpage html format changes
xpath_html_loc='//*[@id="inner-content"]/div[2]/div[1]/div[5]/div[1]/div/lib-city-history-observation/div/div[2]/table'
tables = driver.find_elements(By.XPATH,xpath_html_loc)
print(tables)
except TimeoutException as exception:
raise exception
解决方法
您可以使用以下方法实现相同的目的。
if len(driver.find_elements(By.XPATH,xpath_html_loc))>0:
//Do something
else:
//Do something
使用完整的解决方案更新代码。
#Set link according to data need
driver.get('https://www.wunderground.com/weather/us/pa/indiana/date/2020-09')
#Wait webpage to fully load necessary tables
wait = WebDriverWait(driver,15)
wait.until(EC.element_to_be_clickable((By.XPATH,"//lib-city-header//lib-subnav//div[@class='subnav-contain']//span[contains(text(),'History')]")))
#driver.find_element_by_xpath("//lib-city-header//lib-subnav//div[@class='subnav-contain']//span[contains(text(),'History')]").click()
tables = 0;
try:
xpath_html_loc='//lib-city-history-observation//table'
wait.until(EC.element_to_be_clickable((By.XPATH,xpath_html_loc)))
tables = driver.find_elements(By.XPATH,xpath_html_loc)
print(len(tables))
except TimeoutException as exception:
pass
if tables > 0:
print('IF')
else:
print('Else')
,
我可以使用以下解决方法:
for link in links
try:
print("Trying for ",link)
#Set link according to data need
driver.get(link)
#Wait webpage to fully load necessary tables
wait = WebDriverWait(driver,15)
#Update xpath incase webpage html format changes
xpath_html_loc='//*[@id="inner-content"]/div[2]/div[1]/div[5]/div[1]/div/lib-city-history-observation/div/div[2]/table'
wait.until(EC.presence_of_all_elements_located((By.XPATH,xpath_html_loc)))
tables = driver.find_elements(By.XPATH,xpath_html_loc)
except:
# If the loading took too long,print message
print("Loading took too long! Data unavailable")
continue
if(len(tables)>0:
#Do code here
else:
print("data is unavailable")
continue
即使链接不可用或由于try和except代码而无法加载表,循环仍将继续(这避免了超时异常)。我使用了wait.until直到期望的条件(完全加载所需的网页和表)和find_elements(查找特定的表)。如果在页面中找不到该表,或者即使在加载网页后该表仍然不可用,则@Dilip下方建议的if-else代码将继续for循环。
感谢您的所有帮助!
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。