如何解决使用 Tweepy 在给定时间段内抓取所有推特
大家!当我托盘使用“tweepy”来抓取所有包含关键字“covid19”的推特时,我遇到了一种情况。我的代码逻辑非常简单,我在“text_query”中放置了一些过滤器来过滤转推或回复。
问题是:我尝试了很多方法:
- 在 query_text 中添加“Since...Then...”以抓取给定时间段的 twitter,但它不起作用!
- 在 Cursor 中添加“Since...Then...”,但不起作用!
- 将日期格式更改为日期时间,不起作用
不管我用什么方法,结果都不理想,因为它只返回THE MOST RECENT TWO DAYS
'''
consumerKey ="..."
consumerSecret ="..."
accessToken ="..."
accessTokenSecret ="..."
auth = tweepy.OAuthHandler(consumerKey,consumerSecret)
auth.set_access_token(accessToken,accessTokenSecret)
api = tweepy.API(auth,wait_on_rate_limit=True,wait_on_rate_limit_notify = True)
search = "covid"
text_query = search +" -filter:retweets" + " -filter:replies"
# 'from:@username + since:2019-01-01 until:2020-11-05 -filter:links -filter:replies'
data = {"Tweet ID": [],"Date":[],"Tweet Text": [],"favorite_count": [],"retweet_count":[],"geoInf":[],"Location": []}
tic1 = time.perf_counter()
try:
# Creation of query method using parameters
tweets = tweepy.Cursor(api.search,q=text_query,lang = "en").items()
# tweets_list = [[tweet.created_at,tweet.id,tweet.text,tweet.geo,tweet.favorites,tweet.hashtags] for tweet in tweets if tweet.text[0:2] != "RT" and tweet.created_at > startDate]
#tweets_list = [[tweet.created_at,tweet.favorite_count,tweet.retweet_count,tweet.coordinates,tweet.place] for tweet in tweets]
for tweet in tweets:
data["Tweet Text"].append(tweet.text)
data["Tweet ID"].append(tweet.id)
data["Date"].append(tweet.created_at)
data["favorite_count"].append(tweet.favorite_count)
data["retweet_count"].append(tweet.retweet_count)
data["geoInf"].append(tweet.coordinates)
data["Location"].append(tweet.place)
except BaseException as e:
#print('failed on_status,',str(e))
#time.sleep(3)
pass
df_data = pd.DataFrame(data=data)
location = df_data.dropna(subset=["Location"])
df_data["coordinates"] = ["None"]*df_data.shape[0]
for i in location.index:
df_data.at[i,'Location'] = location["Location"][i].full_name
df_data.at[i,'coordinates'] = list(location['Location'][i].bounding_box.coordinates[0])[0]
toc2 = time.perf_counter()
print(f"Total consumed: {toc2 - tic1:0.4f} seconds")
print("Total tweets: ",df_data.shape[0])
df_data.to_csv(search+".csv")
''' enter image description here
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。