如何解决HTTP 错误 403:禁止 - 使用 python 在 AWS Lambda 上进行 reCAPTCHA 保护的站点抓取
目标站点受 reCAPTCHA 保护,下面的脚本在我的本地机器(Windows 和 Docker WSL Ubuntu)上完美运行,“没有错误”,但它在 AWS Lambda 上导致 403 错误。
代码:
from urllib.request import Request,urlopen
from bs4 import BeautifulSoup
url = 'https://www.avvo.com/attorneys/84025-ut-jason-hunter-284784.html#client_reviews'
headers = {
'user-agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.11 (KHTML,like Gecko) Chrome/23.0.1271.64 Safari/537.11','Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8','Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.3','Accept-Encoding': 'none','Accept-Language': 'en-US,en;q=0.8','Connection': 'keep-alive'
}
with urlopen(Request(url,headers=headers)) as response:
soup = BeautifulSoup(response.read(),features="html.parser")
print(soup.title.text)
错误:
{
"errorMessage": "HTTP Error 403: Forbidden","errorType": "HTTPError","stackTrace": [
" File \"/var/task/lambda_function.py\",line 28,in lambda_handler\n with urlopen(Request(url,headers=headers)) as response:\n"," File \"/var/lang/lib/python3.8/urllib/request.py\",line 222,in urlopen\n return opener.open(url,data,timeout)\n",line 531,in open\n response = meth(req,response)\n",line 640,in http_response\n response = self.parent.error(\n",line 569,in error\n return self._call_chain(*args)\n",line 502,in _call_chain\n result = func(*args)\n",line 649,in http_error_default\n raise HTTPError(req.full_url,code,msg,hdrs,fp)\n"
]
}
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。