微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

将多个数据框的值添加到主数据框

如何解决将多个数据框的值添加到主数据框

这是我的主要数据

API             Close        Date     Dividends    High      Low
@3MIndia       25981.7    2018-09-03     0.0      26662.0   25807.8
@Adani_Gas     25454.0    2018-09-04     0.0      26000.0   25010.0
@AdaniOnline   25307.1    2018-09-05     0.0      25519.2   25015.0 
Affle          25790.3    2018-09-06     0.0      25950.0   25200.0
AGC            25383.9    2018-09-07     0.0      25747.8   25200.0

现在,我想根据 Date 列的不同值来计算 API 列的不同值的Twitter情感。

def mytweet(tweet_text,number,days):
    import tweepy
    from tweepy import Stream
    from tweepy import OAuthHandler
    from tweepy.streaming import StreamListener
    from tweepy.cursor import Cursor
    from datetime import datetime,timedelta
    import json
    import pandas as pd
    #import preprocessor as p
    import re
    from nltk.tokenize import WordPunctTokenizer
    #from nltk.corpus import stopwords
    from nltk.tokenize import sent_tokenize
    import string
    from pandas.io.json import json_normalize
    from textblob import TextBlob
    import os
    os.chdir("C:\\Users\\HP\\UntitledFolder2")

    dataf1 = pd.DataFrame()
    

    cons_key = 'xxxxxxxxx'
    cons_secret = 'xxxxxxx'
    acc_token = 'xxxxxxxx'
    acc_secret = 'xxxxxxx'

    keyword = tweet_text + '-filter:retweets'
    total_tweets = number

    auth = tweepy.OAuthHandler(cons_key,cons_secret)
    auth.set_access_token(acc_token,acc_secret)
    api = tweepy.API(auth)

    today_datetime = datetime.today().Now()
    yesterday_datetime = today_datetime - timedelta(days=days)
    today_date = today_datetime.strftime('%Y-%m-%d')
    yesterday_date = yesterday_datetime.strftime('%Y-%m-%d')
    #api = authentication(cons_key,cons_secret,acc_token,acc_secret)
    search_result = api.search(q=keyword,since=yesterday_date,until=today_date,count=total_tweets,tweet_mode='extended',lang = 'en')
    for tweet in search_result:
        dict = {'Screen Name': tweet.user.screen_name,'Tweet Created At': tweet.user.name,'Tweet Created At': str(tweet.created_at),'Tweet_Text': str(tweet.full_text),'User Location': str(tweet.user.location),'Tweet Coordinates': str(tweet.coordinates),'Retweet Count': str(tweet.retweet_count),'Retweeted': str(tweet.retweeted),'Phone Type': str(tweet.source),'Favorite Count': str(tweet.favorite_count),'Favorited': str(tweet.favorited),'Replied': str(tweet.in_reply_to_status_id_str)}
        dataf = pd.DataFrame([dict])
        user_removed = re.sub(r'@[A-Za-z0-9]+','',str(dataf.Tweet_Text))
        link_removed = re.sub('https?://[A-Za-z0-9./]+',user_removed)
        number_removed = re.sub('[^a-zA-Z]',' ',link_removed)
        lower_case_tweet= number_removed.lower()
        tok = WordPunctTokenizer()
        words = tok.tokenize(str(lower_case_tweet))
        clean_tweet = (' '.join(words)).strip()
        #clean_tweet = p.clean(status['Tweet_Text'])
        dataf['Clean Tweet'] = clean_tweet
        blob = TextBlob(str(dataf['Clean Tweet']))
        sentiment = blob.sentiment
        PO1 = sentiment.polarity
        SU1 = sentiment.subjectivity
        #dataf['Sentiment'] = sentiment
        dataf['PO1'] = PO1
        dataf['SU1'] = SU1
        dataf['PO1'].between(-0.5,0.5,inclusive=False)
        dataf['SU1'].between(-0.5,inclusive=False)
        dataf1=dataf1.append(dataf)
        filename = tweet_text + ".xlsx"

    #dataf1.to_excel(filename)    
    P1 = dataf1.PO1.mean()
    S1 = dataf1.SU1.mean()
    d1 = {'PO': [P1],'SU': [S1]}
    return pd.DataFrame(d1)

我已经编写了这段代码来计算极性和主观性。现在我在进一步操作时仅面临两个问题:

    我想在 API 列中计算不同api值的极性和主观性,并希望将其作为两列 PO SU添加到主数据框

2.i想为我的上述功能指定日期(从我的主要数据日期列中),以便它给我该API在相应日期的极性和主观性。

任何帮助将不胜感激。

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。