stackoverflow上已有很多好的资源,但我仍然遇到问题.我访问过这些来源:
> how to submit query to .aspx page in python
> Submitting a post request to an aspx page
> Scrapping aspx webpage with Python using BeautifulSoup
> http://www.pythonforbeginners.com/cheatsheet/python-mechanize-cheat-sheet
我正试图访问http://www.latax.state.la.us/Menu_ParishTaxRolls/TaxRolls.aspx并选择一个教区.我相信这会强制发布一个帖子,并允许我选择一年,再次发布,并允许更多选择.我按照上述来源以不同的方式编写了我的脚本,并且未能成功提交网站以允许我输入一年.
我目前的代码
import urllib from bs4 import BeautifulSoup import mechanize headers = [ ('Accept','text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8'),('Origin','http://www.indiapost.gov.in'),('User-Agent','Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.17 (KHTML,like Gecko) Chrome/24.0.1312.57 Safari/537.17'),('Content-Type','application/x-www-form-urlencoded'),('Referer','http://www.latax.state.la.us/Menu_ParishTaxRolls/TaxRolls.aspx'),('Accept-Encoding','gzip,deflate,sdch'),('Accept-Language','en-US,en;q=0.8'),] br = mechanize.browser() br.addheaders = headers url = 'http://www.latax.state.la.us/Menu_ParishTaxRolls/TaxRolls.aspx' response = br.open(url) # first HTTP request without form data soup = BeautifulSoup(response) # parse and retrieve two vital form values viewstate = soup.findAll("input",{"type": "hidden","name": "__VIEWSTATE"}) eventvalidation = soup.findAll("input","name": "__EVENTVALIDATION"}) formData = ( ('__EVENTVALIDATION',eventvalidation[0]['value']),('__VIEWSTATE',viewstate[0]['value']),('__VIEWSTATEENCRYPTED',''),) try: fout = open('C:\\GIS\\tmp.htm','w') except: print('Could not open output file\n') fout.writelines(response.readlines()) fout.close()
我也在shell中试过这个,我输入的内容加上我收到的内容(经过修改以减少批量)可以找到http://pastebin.com/KAW5VtXp
无论如何,我尝试更改Parish下拉列表中的值并发布我将被带到网站管理员登录页面.
我接近这个正确的方法吗?任何想法都会非常有帮助.
谢谢!
解决方法
我最终使用了硒.
from selenium import webdriver from selenium.webdriver.common.keys import Keys driver = webdriver.Firefox() driver.get("http://www.latax.state.la.us/Menu_ParishTaxRolls/TaxRolls.aspx") elem = driver.find_element_by_name("ctl00$ContentPlaceHolderMain$ddParish") elem.send_keys("TERREBONNE PARISH") elem.send_keys(Keys.RETURN) elem = driver.find_element_by_name("ctl00$ContentPlaceHolderMain$ddYear") elem.send_keys("2013") elem.send_keys(Keys.RETURN) elem = driver.find_element_by_id("ctl00_ContentPlaceHolderMain_rbSearchField_1") elem.click() APN = 'APN # here' elem = driver.find_element_by_name("ctl00$ContentPlaceHolderMain$txtSearch") elem.send_keys(APN) elem.send_keys(Keys.RETURN) # Access the PDF elem = driver.find_element_by_link_text('Generate Report') elem.click() elements = driver.find_elements_by_tag_name('a') elements[1].click()
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。