如何解决requests.get 返回错误 HTTPSConnectionPool Python
import requests
url1 = 'https://www.pontofrio.com.br/'
headers = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) '
'AppleWebKit/537.11 (KHTML,like Gecko) '
'Chrome/23.0.1271.64 Safari/537.11','Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8','Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.3','Accept-Encoding': 'none','Accept-Language': 'en-US,en;q=0.8','Connection': 'keep-alive'}
response = requests.get(url1,headers,timeout=10)
print(response.status_code)
返回:
Traceback (most recent call last):
File "C:\python34\lib\site-packages\urllib3\connectionpool.py",line 384,in _make_request
six.raise_from(e,None)
File "<string>",line 2,in raise_from
File "C:\python34\lib\site-packages\urllib3\connectionpool.py",line 380,in _make_request
httplib_response = conn.getresponse()
File "C:\python34\lib\http\client.py",line 1148,in getresponse
response.begin()
File "C:\python34\lib\http\client.py",line 352,in begin
version,status,reason = self._read_status()
File "C:\python34\lib\http\client.py",line 314,in _read_status
line = str(self.fp.readline(_MAXLINE + 1),"iso-8859-1")
File "C:\python34\lib\socket.py",line 371,in readinto
return self._sock.recv_into(b)
File "C:\python34\lib\site-packages\urllib3\contrib\pyopenssl.py",line 309,in recv_into
return self.recv_into(*args,**kwargs)
File "C:\python34\lib\site-packages\urllib3\contrib\pyopenssl.py",line 307,in recv_into
raise timeout('The read operation timed out')
socket.timeout: The read operation timed out
During handling of the above exception,another exception occurred:
Traceback (most recent call last):
File "C:\python34\lib\site-packages\requests\adapters.py",line 449,in send
timeout=timeout
File "C:\python34\lib\site-packages\urllib3\connectionpool.py",line 638,in urlopen
_stacktrace=sys.exc_info()[2])
File "C:\python34\lib\site-packages\urllib3\util\retry.py",line 367,in increment
raise six.reraise(type(error),error,_stacktrace)
File "C:\python34\lib\site-packages\urllib3\packages\six.py",line 686,in reraise
raise value
File "C:\python34\lib\site-packages\urllib3\connectionpool.py",line 600,in urlopen
chunked=chunked)
File "C:\python34\lib\site-packages\urllib3\connectionpool.py",line 386,in _make_request
self._raise_timeout(err=e,url=url,timeout_value=read_timeout)
File "C:\python34\lib\site-packages\urllib3\connectionpool.py",line 306,in _raise_timeout
raise ReadTimeoutError(self,url,"Read timed out. (read timeout=%s)" % timeout_value)
urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='www.pontofrio.com.br',port=443): Read timed out. (read timeout=10)
During handling of the above exception,another exception occurred:
Traceback (most recent call last):
File "c:/teste.py",line 219,in <module>
url = montaurl(dominio)
File "c:/teste.py",line 81,in montaurl
response = requests.get(url1,timeout=10)
File "C:\python34\lib\site-packages\requests\api.py",line 75,in get
return request('get',params=params,**kwargs)
File "C:\python34\lib\site-packages\requests\api.py",line 60,in request
return session.request(method=method,**kwargs)
File "C:\python34\lib\site-packages\requests\sessions.py",line 533,in request
resp = self.send(prep,**send_kwargs)
File "C:\python34\lib\site-packages\requests\sessions.py",line 646,in send
r = adapter.send(request,**kwargs)
File "C:\python34\lib\site-packages\requests\adapters.py",line 529,in send
raise ReadTimeout(e,request=request)
requests.exceptions.ReadTimeout: HTTPSConnectionPool(host='www.pontofrio.com.br',port=443): Read timed out. (read timeout=10)
有效的域:
不起作用的域:
- casasbahia.com.br
- extra.com.br
- boticario.com.br
我认为这是 pontofrio 服务器上的某个阻塞,我该如何解决这个问题?
解决方法
似乎有几个问题,首先是如何设置标题。下面实际上并没有将自定义标头传递给 requests.get 函数。
sig = 5.670*10^-8;
k = 0.7;
h = 130;
p = 0.7;
r1 = 1.50;
r2 = r1 + (1/1000);
r3 = r2 + (100/1000);
T1 = -40 + 273.15;
T3 = 35 + 273.15;
A = 4*pi*r1^2;
e1 = 1-p;
e2 = 1;
syms T2a real
Index = 1;
for x = 40: 150
equ2 = (T2a-T3)/(((r3-r2)/((4*pi*r2*r3)*k))+(1/((4*pi*(r3^2))*h))) == (A*sig*((x^4)-(T2a^4)))/((1/e1)+(((1-e2)/e2)*(r1/r2)^2));
assume(T2a,'real');
Solutions = double(vpasolve(equ2,T2a)).';
y(Index,1:2) = [Solutions(1) Solutions(2)];
Index = Index + 1;
end
x = (40:150);
plot(x,y);
title("Solutions for T2a")
xlabel('x'); ylabel('T2a');
legend('Solution 1','Solution 2');
xlim([40 150]);
这可以针对 httpbin 进行测试:
response = requests.get(url1,headers,timeout=10)
输出:
import requests
url1 = 'https://httpbin.org/headers'
headers = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) '
'AppleWebKit/537.11 (KHTML,like Gecko) '
'Chrome/23.0.1271.64 Safari/537.11','Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8','Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.3','Accept-Encoding': 'none','Accept-Language': 'en-US,en;q=0.8','Connection': 'keep-alive'
}
response = requests.get(url1,timeout=10)
print(response.text)
print(response.status_code)
正确设置 headers 参数:
{
"headers": {
"Accept": "*/*","Accept-Encoding": "gzip,deflate","Host": "httpbin.org","User-Agent": "python-requests/2.25.1","X-Amzn-Trace-Id": "Root=1-608a0391-3f1cfa79444ac04865ad9111"
}
}
200
让我们测试:
response = requests.get(url1,headers=headers,timeout=10)
输出如下:
import requests
url1 = 'https://httpbin.org/headers'
headers = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) '
'AppleWebKit/537.11 (KHTML,timeout=10)
print(response.text)
print(response.status_code)
最后,标题的顺序和特别是 {
"headers": {
"Accept": "text/html,*/*;q=0.8","Accept-Charset": "ISO-8859-1,*;q=0.3","Accept-Encoding": "none","Accept-Language": "en-US,en;q=0.8","User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.11 (KHTML,like Gecko) Chrome/23.0.1271.64 Safari/537.11","X-Amzn-Trace-Id": "Root=1-608a0533-40c8281f5faa85d1050c6b6a"
}
}
200
标题导致了问题。一旦我重新排序并删除了 'Connection': 'keep-alive'
标头,它就会开始处理所有网址。
这是我用来测试的代码:
Connection
和输出:
import requests
urls = ['https://www.pontofrio.com.br/','https://www.casasbahia.com.br','https://www.extra.com.br','https://www.boticario.com.br']
headers = {'Accept': 'text/html,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9','Accept-Encoding': 'gzip,deflate,br',en;q=0.9','User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML,like Gecko) Chrome/92.0.4491.0 Safari/537.36'}
for url1 in urls:
print("Trying url: %s"% url1)
response = requests.get(url1,timeout=10)
print(response.status_code)
,
我已经测试过使用 wget
访问页面,但没有成功。问题似乎是服务器只响应 HTTP/2
请求。
使用 curl
进行测试:
这个超时:
$ curl --http1.1 -A "Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Firefox/81.0" "https://www.pontofrio.com.br/"
# times out
此成功(注意 --http2
参数):
$ curl --http2 -A "Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Firefox/81.0" "https://www.pontofrio.com.br/"
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:og="http://ogp.me/ns#" xmlns:fb="http://www.facebook.com/2008/fbml">
...
不幸的是,requests
模块不支持它。但是,您可以使用具有实验性 HTTP/2 支持的 httpx
模块:
import httpx
import asyncio
headers = {
"User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:88.0) Gecko/20100101 Firefox/88.0",}
async def get_text(url):
async with httpx.AsyncClient(http2=True,headers=headers) as client:
r = await client.get(url)
return r.text
txt = asyncio.run(get_text("https://www.pontofrio.com.br/"))
print(txt)
打印:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:og="http://ogp.me/ns#" xmlns:fb="http://www.facebook.com/2008/fbml">
...
要安装支持 HTTP/2 的 httpx
模块,例如使用 pip install httpx[http2]
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。