微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

Python-基于API限制的错误网关错误

如何解决Python-基于API限制的错误网关错误

我正在尝试从CKAN API URL中检索数据:

import urllib.request
import json
import pandas as pd
url = 'https://data.gov.il/api/3/action/datastore_search?resource_id=dcf999c1-d394-4b57-a5e0-9d014a62e046&limit=1000000'
with urllib.request.urlopen(url) as response:
    html = response.read()
    result = json.loads(html)
    df = pd.DataFrame(result['result']['records'])

但是出现以下错误

---------------------------------------------------------------------------
HTTPError                                 Traceback (most recent call last)
<ipython-input-44-8484123eecdc> in <module>
      2 import pandas as pd
      3 url = 'https://data.gov.il/api/3/action/datastore_search?resource_id=dcf999c1-d394-4b57-a5e0-9d014a62e046&limit=1000000'
----> 4 with urllib.request.urlopen(url) as response:
      5     html = response.read()
      6     result = json.loads(html)

~\miniconda3\lib\urllib\request.py in urlopen(url,data,timeout,cafile,capath,cadefault,context)
    220     else:
    221         opener = _opener
--> 222     return opener.open(url,timeout)
    223 
    224 def install_opener(opener):

~\miniconda3\lib\urllib\request.py in open(self,fullurl,timeout)
    529         for processor in self.process_response.get(protocol,[]):
    530             meth = getattr(processor,meth_name)
--> 531             response = meth(req,response)
    532 
    533         return response

~\miniconda3\lib\urllib\request.py in http_response(self,request,response)
    638         # request was successfully received,understood,and accepted.
    639         if not (200 <= code < 300):
--> 640             response = self.parent.error(
    641                 'http',response,code,msg,hdrs)
    642 

~\miniconda3\lib\urllib\request.py in error(self,proto,*args)
    567         if http_err:
    568             args = (dict,'default','http_error_default') + orig_args
--> 569             return self._call_chain(*args)
    570 
    571 # XXX probably also want an abstract factory that kNows when it makes

~\miniconda3\lib\urllib\request.py in _call_chain(self,chain,kind,meth_name,*args)
    500         for handler in handlers:
    501             func = getattr(handler,meth_name)
--> 502             result = func(*args)
    503             if result is not None:
    504                 return result

~\miniconda3\lib\urllib\request.py in http_error_default(self,req,fp,hdrs)
    647 class HTTPDefaultErrorHandler(BaseHandler):
    648     def http_error_default(self,hdrs):
--> 649         raise HTTPError(req.full_url,hdrs,fp)
    650 
    651 class HTTPRedirectHandler(BaseHandler):

HTTPError: HTTP Error 502: Bad Gateway

有趣的是,如果我在URL中使用下限,例如:

url = 'https://...&limit=10000'

一切正常。如果我完全没有限制,那么它只会检索前100条记录。
谁能解释为什么会这样?这是服务器端的限制吗?我该如何解决这个问题,以便无论包含多少记录(有频繁的更新添加更多记录),我都可以获取整个数据集? 另外,这是从CKAN API提取数据的正确方法吗?如果没有,我将很高兴看到应该怎么做。

解决方法

CKAN api上有一些限制,如果您需要查询100条以上的记录,则需要设置偏移量并进行多次查询,例如分页。

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。