如何解决将多个数据框的值添加到主数据框
这是我的主要数据
API Close Date Dividends High Low
@3MIndia 25981.7 2018-09-03 0.0 26662.0 25807.8
@Adani_Gas 25454.0 2018-09-04 0.0 26000.0 25010.0
@AdaniOnline 25307.1 2018-09-05 0.0 25519.2 25015.0
Affle 25790.3 2018-09-06 0.0 25950.0 25200.0
AGC 25383.9 2018-09-07 0.0 25747.8 25200.0
现在,我想根据 Date 列的不同值来计算 API 列的不同值的Twitter情感。
def mytweet(tweet_text,number,days):
import tweepy
from tweepy import Stream
from tweepy import OAuthHandler
from tweepy.streaming import StreamListener
from tweepy.cursor import Cursor
from datetime import datetime,timedelta
import json
import pandas as pd
#import preprocessor as p
import re
from nltk.tokenize import WordPunctTokenizer
#from nltk.corpus import stopwords
from nltk.tokenize import sent_tokenize
import string
from pandas.io.json import json_normalize
from textblob import TextBlob
import os
os.chdir("C:\\Users\\HP\\UntitledFolder2")
dataf1 = pd.DataFrame()
cons_key = 'xxxxxxxxx'
cons_secret = 'xxxxxxx'
acc_token = 'xxxxxxxx'
acc_secret = 'xxxxxxx'
keyword = tweet_text + '-filter:retweets'
total_tweets = number
auth = tweepy.OAuthHandler(cons_key,cons_secret)
auth.set_access_token(acc_token,acc_secret)
api = tweepy.API(auth)
today_datetime = datetime.today().Now()
yesterday_datetime = today_datetime - timedelta(days=days)
today_date = today_datetime.strftime('%Y-%m-%d')
yesterday_date = yesterday_datetime.strftime('%Y-%m-%d')
#api = authentication(cons_key,cons_secret,acc_token,acc_secret)
search_result = api.search(q=keyword,since=yesterday_date,until=today_date,count=total_tweets,tweet_mode='extended',lang = 'en')
for tweet in search_result:
dict = {'Screen Name': tweet.user.screen_name,'Tweet Created At': tweet.user.name,'Tweet Created At': str(tweet.created_at),'Tweet_Text': str(tweet.full_text),'User Location': str(tweet.user.location),'Tweet Coordinates': str(tweet.coordinates),'Retweet Count': str(tweet.retweet_count),'Retweeted': str(tweet.retweeted),'Phone Type': str(tweet.source),'Favorite Count': str(tweet.favorite_count),'Favorited': str(tweet.favorited),'Replied': str(tweet.in_reply_to_status_id_str)}
dataf = pd.DataFrame([dict])
user_removed = re.sub(r'@[A-Za-z0-9]+','',str(dataf.Tweet_Text))
link_removed = re.sub('https?://[A-Za-z0-9./]+',user_removed)
number_removed = re.sub('[^a-zA-Z]',' ',link_removed)
lower_case_tweet= number_removed.lower()
tok = WordPunctTokenizer()
words = tok.tokenize(str(lower_case_tweet))
clean_tweet = (' '.join(words)).strip()
#clean_tweet = p.clean(status['Tweet_Text'])
dataf['Clean Tweet'] = clean_tweet
blob = TextBlob(str(dataf['Clean Tweet']))
sentiment = blob.sentiment
PO1 = sentiment.polarity
SU1 = sentiment.subjectivity
#dataf['Sentiment'] = sentiment
dataf['PO1'] = PO1
dataf['SU1'] = SU1
dataf['PO1'].between(-0.5,0.5,inclusive=False)
dataf['SU1'].between(-0.5,inclusive=False)
dataf1=dataf1.append(dataf)
filename = tweet_text + ".xlsx"
#dataf1.to_excel(filename)
P1 = dataf1.PO1.mean()
S1 = dataf1.SU1.mean()
d1 = {'PO': [P1],'SU': [S1]}
return pd.DataFrame(d1)
我已经编写了这段代码来计算极性和主观性。现在我在进一步操作时仅面临两个问题:
-
我想在 API 列中计算不同api值的极性和主观性,并希望将其作为两列 PO 和 SU添加到主数据框。
2.i想为我的上述功能指定日期(从我的主要数据日期列中),以便它给我该API在相应日期的极性和主观性。
任何帮助将不胜感激。
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。