如何解决Python-基于API限制的错误网关错误
我正在尝试从CKAN API URL中检索数据:
import urllib.request
import json
import pandas as pd
url = 'https://data.gov.il/api/3/action/datastore_search?resource_id=dcf999c1-d394-4b57-a5e0-9d014a62e046&limit=1000000'
with urllib.request.urlopen(url) as response:
html = response.read()
result = json.loads(html)
df = pd.DataFrame(result['result']['records'])
但是出现以下错误:
---------------------------------------------------------------------------
HTTPError Traceback (most recent call last)
<ipython-input-44-8484123eecdc> in <module>
2 import pandas as pd
3 url = 'https://data.gov.il/api/3/action/datastore_search?resource_id=dcf999c1-d394-4b57-a5e0-9d014a62e046&limit=1000000'
----> 4 with urllib.request.urlopen(url) as response:
5 html = response.read()
6 result = json.loads(html)
~\miniconda3\lib\urllib\request.py in urlopen(url,data,timeout,cafile,capath,cadefault,context)
220 else:
221 opener = _opener
--> 222 return opener.open(url,timeout)
223
224 def install_opener(opener):
~\miniconda3\lib\urllib\request.py in open(self,fullurl,timeout)
529 for processor in self.process_response.get(protocol,[]):
530 meth = getattr(processor,meth_name)
--> 531 response = meth(req,response)
532
533 return response
~\miniconda3\lib\urllib\request.py in http_response(self,request,response)
638 # request was successfully received,understood,and accepted.
639 if not (200 <= code < 300):
--> 640 response = self.parent.error(
641 'http',response,code,msg,hdrs)
642
~\miniconda3\lib\urllib\request.py in error(self,proto,*args)
567 if http_err:
568 args = (dict,'default','http_error_default') + orig_args
--> 569 return self._call_chain(*args)
570
571 # XXX probably also want an abstract factory that kNows when it makes
~\miniconda3\lib\urllib\request.py in _call_chain(self,chain,kind,meth_name,*args)
500 for handler in handlers:
501 func = getattr(handler,meth_name)
--> 502 result = func(*args)
503 if result is not None:
504 return result
~\miniconda3\lib\urllib\request.py in http_error_default(self,req,fp,hdrs)
647 class HTTPDefaultErrorHandler(BaseHandler):
648 def http_error_default(self,hdrs):
--> 649 raise HTTPError(req.full_url,hdrs,fp)
650
651 class HTTPRedirectHandler(BaseHandler):
HTTPError: HTTP Error 502: Bad Gateway
有趣的是,如果我在URL中使用下限,例如:
url = 'https://...&limit=10000'
一切正常。如果我完全没有限制,那么它只会检索前100条记录。
谁能解释为什么会这样?这是服务器端的限制吗?我该如何解决这个问题,以便无论包含多少记录(有频繁的更新添加更多记录),我都可以获取整个数据集?
另外,这是从CKAN API提取数据的正确方法吗?如果没有,我将很高兴看到应该怎么做。
解决方法
CKAN api上有一些限制,如果您需要查询100条以上的记录,则需要设置偏移量并进行多次查询,例如分页。
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。