如何解决计算包含特定单词的推文在一年内的频率
我试图计算一个单词在一年内的推文数量,同时记下每天的推文数量并存储,而不是将其存储在带有“日期”和“频率”的 CSV 文件中。这是我的代码,但我运行一段时间后一直出错。
import pandas as pd
import twint
import nest_asyncio
from datetime import datetime,timedelta
bugun = '2020-01-01'
yarin = '2020-01-02'
df = pd.DataFrame(columns=("Data","Frequency"))
for i in range(365):
file = open("Test.csv","w")
file.close()
bugun = (datetime.strptime(bugun,'%Y-%m-%d') + timedelta(days=1)).strftime('%Y-%m-%d')
yarin =(datetime.strptime(yarin,'%Y-%m-%d') + timedelta(days=1)).strftime('%Y-%m-%d')
nest_asyncio.apply()
c = twint.Config()
c.Search = "Chainlink"
#c.Hide_output=True
c.Since= bugun
c.Until= yarin
c.Store_csv = True
c.Output = "Test.csv"
c.Count = True
twint.run.Search(c)
data = pd.read_csv("Test.csv")
frequency = str(len(data))
#d = {"Data": [bugun],"Frequency": [frequency]}
#d_f = pd.DataFrame(data=d)
#df = df.append(d_f,ignore_index=True)
df.loc[i] = [bugun] + [frequency]
df.to_csv (r'C:\Users\serap\Desktop\CRYPTO 100\Chainlink.csv',index = False,header=False)
我得到的错误是这个
File "C:\Users\serap\Desktop\CRYPTO 100\CODES\Binance_Coin\Binance Coin.py",line 47,in <module>
data = pd.read_csv("Test.csv")
File "C:\Users\serap\AppData\Local\Programs\Python\python38\lib\site-packages\pandas\io\parsers.py",line 605,in read_csv
return _read(filepath_or_buffer,kwds)
File "C:\Users\serap\AppData\Local\Programs\Python\python38\lib\site-packages\pandas\io\parsers.py",line 457,in _read
parser = TextFileReader(filepath_or_buffer,**kwds)
File "C:\Users\serap\AppData\Local\Programs\Python\python38\lib\site-packages\pandas\io\parsers.py",line 814,in __init__
self._engine = self._make_engine(self.engine)
File "C:\Users\serap\AppData\Local\Programs\Python\python38\lib\site-packages\pandas\io\parsers.py",line 1045,in _make_engine
return mapping[engine](self.f,**self.options) # type: ignore[call-arg]
File "C:\Users\serap\AppData\Local\Programs\Python\python38\lib\site-packages\pandas\io\parsers.py",line 1893,in __init__
self._reader = parsers.TextReader(self.handles.handle,**kwds)
File "pandas\_libs\parsers.pyx",line 521,in pandas._libs.parsers.TextReader.__cinit__
EmptyDataError: No columns to parse from file
谢谢你的帮助:)
解决方法
阅读教程 How to Scrape Tweets from Twitter with Python Twint | by Andika Pratama | Analytics Vidhya | Medium 后,我认为您最好让 Twint 进行迭代:
c = twint.Config()
c.Search = "Chainlink"
c.Since = "2020–01–01"
c.Until = "2021–01–01"
c.Store_csv = True
c.Output = "Test.csv"
c.Count = True
twint.run.Search(c)
现在您可以遍历 CSV 输出:
data = pd.read_csv("Test.csv")
# ...
直到现在,我还没有找到有关 CSV 输出的详细信息,但是 twint 源代码 (master/twint/storage/write.py
(line 58 ff)) 表明,对于 CSV,如果文件已存在,则附加输出。因此,您可能必须先截断它或删除现有文件。一个有效的选项可能是
open(`Test.csv`,'w').close()
...这与您所做的基本相同,但没有引入另一个变量。
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。