微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

Tweepy 双刮

如何解决Tweepy 双刮

我一直在使用 tweepy 抓取 twitter 大约 9 个月。上周五,我的抓取工具停止工作,因为它做了两件事:1)当用户个人资料中存在推文时,它开始返回一个空列表而不是以前的推文 2)当只应抓取最新的推文时,抓取旧的推文.有没有人遇到过同样的问题?任何建议的修复表示赞赏!

def get_tweets(username):
    # Authorization to consumer key and consumer secret
    auth = tweepy.OAuthHandler(consumer_key,consumer_secret)
    # Access to user's access key and access secret
    auth.set_access_token(access_key,access_secret)
    # Calling api

    api = tweepy.API(auth,wait_on_rate_limit=True,wait_on_rate_limit_notify=True)
    text_of_tweet = None
    tweet_id = None
 
   number_of_tweets = 1
    # Scrape the most recent tweet on the users timeline
    tweet = api.user_timeline(screen_name=username,count=number_of_tweets,include_rts=False)


    # Check if string all ascii
    for item in tweet:
        text_of_tweet = item.text
        tweet_id = item.id


    if (all(ord(c) < 128 for c in text_of_tweet)) == False:
        text_of_tweet = conv_true_ascii(text_of_tweet)

    list_of_sentences = re.split(r'(?<=[^A-Z].[.?]) +(?=[A-Z])',text_of_tweet)
    text_of_tweet = list_of_sentences[0]
    text_of_tweet = text_of_tweet.split('\n')[0]

    # Write to CSV
    # csvWriter.writerow([text_of_tweet,tweet_time,tweet_id])

    # Return tweet
    return text_of_tweet,tweet_id

def conv_true_ascii(single_tweet):
    edit_start = single_tweet.encode('ascii',errors='ignore')
    edited_tweet = edit_start + b'' * (len(single_tweet) - len(edit_start))
    edited_tweet = str(edited_tweet)
    edited_tweet = edited_tweet.replace("b'",'')
    edited_tweet = edited_tweet.replace(edited_tweet[-1],'')

    return edited_tweet


版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。