如何解决下载 SEC 数据时出现递归错误
我目前正在尝试使用 sec_edgar_downloader 库从 SEC EDGAR 下载 S-1 文件。我有一个由 CIK 值组成的 Pandas DataFrame,对于每个值,我想在可用时下载相关的 S-1。为了检查哪些公司没有它,我添加了一个新列,当找到并下载文件时该列等于 1,否则为 0。我运行的代码是
df["s1"] = df["s1"].apply(lambda x : tryconvert(CIK_check(x)))
def tryconvert(x):
try:
CIK_check(x)
except RecursionError:
return "0"
和 CIK_check() 是一个定义为
的函数def CIK_check(x):
time.sleep(0.3)
if dl.get("S-1",x) == 1:
return "1"
else:
return "0"
CIK_check 在文件可用时执行下载文件并返回表示下载是否成功的二进制值的操作。我不得不添加 tryconvert() 以尝试解决最终在尝试运行代码时出现的错误,其中引发以下错误:
RecursionError Traceback (most recent call last)
<ipython-input-243-a8a327555f29> in <module>
----> 1 df["s1"] = df["s1"].apply(lambda x : tryconvert(CIK_check(x)))
~/opt/anaconda3/lib/python3.8/site-packages/pandas/core/series.py in apply(self,func,convert_dtype,args,**kwds)
3846 else:
3847 values = self.astype(object).values
-> 3848 mapped = lib.map_infer(values,f,convert=convert_dtype)
3849
3850 if len(mapped) and isinstance(mapped[0],Series):
pandas/_libs/lib.pyx in pandas._libs.lib.map_infer()
<ipython-input-243-a8a327555f29> in <lambda>(x)
----> 1 df["s1"] = df["s1"].apply(lambda x : tryconvert(CIK_check(x)))
<ipython-input-241-62c62b553142> in CIK_check(x)
1 def CIK_check(x):
2 time.sleep(0.3)
----> 3 if dl.get("S-1",x) == 1:
4 return "1"
5 else:
~/opt/anaconda3/lib/python3.8/site-packages/sec_edgar_downloader/Downloader.py in get(self,filing,ticker_or_cik,amount,after,before,include_amends,download_details,query)
167 )
168
--> 169 download_filings(
170 self.download_folder,171 ticker_or_cik,~/opt/anaconda3/lib/python3.8/site-packages/sec_edgar_downloader/_utils.py in download_filings(download_folder,filing_type,filings_to_fetch,include_filing_details)
261 if include_filing_details:
262 try:
--> 263 download_and_save_filing(
264 download_folder,265 ticker_or_cik,~/opt/anaconda3/lib/python3.8/site-packages/sec_edgar_downloader/_utils.py in download_and_save_filing(download_folder,accession_number,download_url,save_filename,resolve_urls)
218 if resolve_urls and Path(save_filename).suffix == ".html":
219 base_url = f"{download_url.rsplit('/',1)[0]}/"
--> 220 filing_text = resolve_relative_urls_in_filing(filing_text,base_url)
221
222 # Create all parent directories as needed and write content to file
~/opt/anaconda3/lib/python3.8/site-packages/sec_edgar_downloader/_utils.py in resolve_relative_urls_in_filing(filing_text,base_url)
198 return soup
199
--> 200 return soup.encode(soup.original_encoding)
201
202
~/opt/anaconda3/lib/python3.8/site-packages/bs4/element.py in encode(self,encoding,indent_level,formatter,errors)
1526 # Turn the data structure into Unicode,then encode the
1527 # Unicode.
-> 1528 u = self.decode(indent_level,formatter)
1529 return u.encode(encoding,errors)
1530
~/opt/anaconda3/lib/python3.8/site-packages/bs4/__init__.py in decode(self,pretty_print,eventual_encoding,formatter)
742 else:
743 indent_level = 0
--> 744 return prefix + super(BeautifulSoup,self).decode(
745 indent_level,formatter)
746
~/opt/anaconda3/lib/python3.8/site-packages/bs4/element.py in decode(self,formatter)
1596 else:
1597 indent_contents = None
-> 1598 contents = self.decode_contents(
1599 indent_contents,formatter
1600 )
~/opt/anaconda3/lib/python3.8/site-packages/bs4/element.py in decode_contents(self,formatter)
1690 text = c.output_ready(formatter)
1691 elif isinstance(c,Tag):
-> 1692 s.append(c.decode(indent_level,1693 formatter))
1694 preserve_whitespace = (
... last 2 frames repeated,from the frame below ...
~/opt/anaconda3/lib/python3.8/site-packages/bs4/element.py in decode(self,formatter
1600 )
RecursionError: maximum recursion depth exceeded
但是,这不起作用,因为我仍然收到此错误,这使得我无法完成我尝试执行的任务。错误的原因可能是什么? (不幸的是,鉴于它是 Pandas DataFrame 上的应用函数,不清楚在哪个条目引发错误)。有没有其他方法可以克服 RecursionError 而不必停止计算并简单地将其视为标记为 0 的失败下载?
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。