如何解决如何为多个Twitter句柄/用户收集200条以上的推文?
对于给定数量的用户,我试图收集超过Twitter的200条推文速率限制。
但是,我的代码仅在用户具有200条以下推文时填充数据框,而无法将来自200条以上推文的用户的值附加到数据框。
完整代码 IN:
import tweepy
import pandas as pd
import numpy as np
from datetime import timedelta
handles = ['@MrML16419203','@d00tn00t']
consumerKey,consumerSecret,accessToken,accessTokenSecret = 'x','x','x'
authenticate = tweepy.OAuthHandler(consumerKey,consumerSecret)
authenticate.set_access_token(accessToken,accessTokenSecret)
api_twitter = tweepy.API(authenticate,wait_on_rate_limit=True)
total_tweets = []
def get_tweets(handle):
batch_count_for_tweet_downloads = 200
try:
alltweets = []
tweets = api_twitter.user_timeline(screen_name=handle,count=batch_count_for_tweet_downloads,exclude_replies=True,include_rts=False,lang="en",tweet_mode="extended")
alltweets.extend(tweets)
oldest = alltweets[-1].id - 1
oldest_datetime = pd.to_datetime(str(pd.to_datetime(oldest))[:-10]).strftime("%Y-%m-%d %H:%M:%S")
print(f"Getting Tweets For " + handle + ",After: " + oldest_datetime)
while len(tweets) > 0:
tweets = api_twitter.user_timeline(screen_name=handle,max_id=oldest)
alltweets.extend(tweets)
if len(alltweets) > 0:
oldest = alltweets[-1].id - 1
else:
pass
print("Count: " + f"...{len(alltweets)} " + handle + " Tweets Downloaded")
print('---Total Downloaded: ' + str(len(alltweets)) + ' for ' + handle + '---')
df = pd.DataFrame(data=[tweets.user.screen_name for tweets in alltweets],columns=['Handle'])
df['Tweets'] = np.array([tweets.full_text for tweets in alltweets])
df['Date'] = np.array([tweets.created_at - timedelta(hours=4) for tweets in alltweets])
df['Len'] = np.array([len(tweets.full_text) for tweets in alltweets])
df['Like_count'] = np.array([tweets.favorite_count for tweets in alltweets])
df['RT_count'] = np.array([tweets.retweet_count for tweets in alltweets])
total_tweets.extend(alltweets)
print("----------Total Tweets Extracted: {}".format(df.shape[0]) + "----------")
except:
pass
return df
df = pd.DataFrame()
for handle in handles:
df_new = get_tweets(handle)
df = pd.concat((df,df_new))
print(df)
OUT:
Handle Tweets Date Len Like_count RT_count
0 MrML16419203 132716 2020-09-02 02:18:28 6.0 0.0 0.0
1 MrML16419203 432881 2020-09-02 02:04:23 6.0 0.0 0.0
2 MrML16419203 973625 2020-09-02 02:04:09 6.0 0.0 0.0
3 MrML16419203 1234567 2020-09-02 01:55:10 7.0 0.0 0.0
4 MrML16419203 225865 2020-09-02 01:27:11 6.0 0.0 0.0
.. ... ... ... ... ... ...
536 d00tn00t NaN NaT NaN NaN NaN
537 d00tn00t NaN NaT NaN NaN NaN
538 d00tn00t NaN NaT NaN NaN NaN
539 d00tn00t NaN NaT NaN NaN NaN
540 d00tn00t NaN NaT NaN NaN NaN
您可以看到,即使我的控制台显示while循环正在下载这些数据点,拥有200条以上推文的任何用户仍会返回NaN和NaT值。
我尝试过多种解决方案(例如游标),但都没有用,并且在尝试仅从200条以上推文中提取推文时收到长度不匹配错误。这是因为返回的数据框为空(除了“句柄”列之外),并且在导出为CSV时可以观察到。
任何帮助将不胜感激。谢谢。
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。