如何解决使用屏幕名称和Tweepy收集推文
我有一个Twitter屏幕名称列表(一百个),并且希望每个屏幕名称收集3200条推文。但是我只能使用以下代码收集总共3200条推文,因为如果我尝试输入100个屏幕名称,它已达到收集推文的极限。 ..有人可以建议每个屏幕名称收集3200条推文。 ?如果您能分享一些建议,将不胜感激!预先谢谢你!
import tweepy
import csv
def get_all_tweets(screen_name):
consumer_key = ****
consumer_secret = ****
access_key = ****
access_secret = ****
#authorize twitter,initialize tweepy
auth = tweepy.OAuthHandler(consumer_key,consumer_secret)
auth.set_access_token(access_key,access_secret)
api = tweepy.API(auth,wait_on_rate_limit=True)
#initialize a list to hold all the tweepy Tweets & list with no retweets
alltweets = []
noRT = []
#make initial request for most recent tweets with extended mode enabled to get full tweets
new_tweets = api.user_timeline(screen_name = screen_name,tweet_mode = 'extended',count=200,include_retweets=False)
#save most recent tweets
alltweets.extend(new_tweets)
#save the id of the oldest tweet less one
oldest = alltweets[-1].id - 1
#keep grabbing tweets until the api limit is reached
while len(alltweets) <= 3200:
print("getting tweets before {}".format(oldest))
#all subsiquent requests use the max_id param to prevent duplicates
new_tweets = api.user_timeline(screen_name = screen_name,max_id=oldest,include_retweets=False)
#save most recent tweets
alltweets.extend(new_tweets)
#update the id of the oldest tweet less one
oldest = alltweets[-1].id - 1
print("...{} tweets downloaded so far".format(len(alltweets)))
#removes retweets
for tweet in alltweets:
if 'RT' in tweet.full_text:
continue
else:
noRT.append([tweet.id_str,tweet.created_at,tweet.full_text,])
#write to csv
with open('{}_tweets.csv'.format(screen_name),'w') as f:
writer = csv.writer(f)
writer.writerow(["id","created_at","text"])
writer.writerows(noRT)
print('{}_tweets.csv was successfully created.'.format(screen_name))
pass
if __name__ == '__main__':
#pass in the username of the account you want to download. I have hundred username in the list
usernames = ["JLo","ABC",'Trump']
for x in usernames:
get_all_tweets(x)
解决方法
首先,为了遍历时间表,您必须使用分页。我建议您在tweepy中使用Cursor,因为它比处理max_id等要容易得多。
for page in tweepy.Cursor(api.user_timeline,screen_name = screen_name,tweet_mode="extended",include_retweets=False,count=100).pages(num_pages = 32):
for status in page:
# do your process on status
第二,您确实可以在此处找到速率限制,因此,收到警告您达到该速率限制并不罕见: https://developer.twitter.com/en/docs/twitter-api/v1/tweets/timelines/faq
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。