微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

来自 Watson STT 的令人困惑的演讲者标签时间戳显示每个演讲者所说的秒数不正确

如何解决来自 Watson STT 的令人困惑的演讲者标签时间戳显示每个演讲者所说的秒数不正确

我正在编写一个脚本来转录 wav 文件中的单词。在 from 和 to 列中,我怀疑问题在于它显示了单词的时间戳,而不是说话者的整个句子的时间戳。这如何通过python纠正?这似乎不是 API 问题

我的脚本-

####RUN THIS PART FirsT#########
import json
from os.path import join,dirname
from ibm_watson import SpeechToTextV1
from ibm_watson.websocket import RecognizeCallback,AudioSource
import threading
from ibm_cloud_sdk_core.authenticators import IAMAuthenticator
import pandas as pd
authenticator = IAMAuthenticator('vv-xx')

service = SpeechToTextV1(authenticator=authenticator)
service.set_service_url('https://api.us-east.speech-to-text.watson.cloud.ibm.com')

models = service.list_models().get_result()
#print(json.dumps(models,indent=2))

model = service.get_model('en-US_broadbandModel').get_result()
#print(json.dumps(model,indent=2))

# This is the name of the file u need to change below
with open(join(dirname('__file__'),'1003392536_1003392531_e5d4f4210c818cab20715d61.wav'),'rb') as audio_file:
#    print(json.dumps(
    output = service.recognize(
    audio=audio_file,speaker_labels=False,smart_formatting= True,content_type='audio/wav',inactivity_timeout = -1,model='en-US_NarrowbandModel',continuous=True).get_result(),indent=2
  ############END################################  

# get data to a csv
########################RUN THIS PART SECOND#####################################
df0 = pd.DataFrame([i for elts in output for alts in elts['results'] for i in alts['alternatives']])

# df1 = pd.DataFrame([i for elts in output for i in elts['speaker_labels']])

list(df0.columns) 
list(df1.columns) 
df0 = df0.drop(["timestamps"],axis=1)
df1 = df1.drop(["final"],axis=1)
df1 = df1.drop(['confidence'],axis=1)
test3 = pd.concat([df0,df1],axis=1)
#sentiment
transcript = test3['transcript']
transcript = transcript.dropna()
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
analyzer = SentimentIntensityAnalyzer()
text = transcript
scores = []
for txt in text:
    vs = analyzer.polarity_scores(txt)
    scores.append(vs)
data = pd.DataFrame(text,columns= ['Text'])
data2 = pd.DataFrame(scores)
final_dataset= pd.concat([data,data2],axis=1)
test4 = pd.concat([test3,final_dataset],axis=1)
test4 = test4.drop(['Text'],axis=1)
test4.rename(columns={'neg':'Negative'},inplace=True)
test4.rename(columns={'pos':'Positive'},inplace=True)
test4.rename(columns={'neu':'Neutral'},inplace=True)

# This is the name of the output csv file u need to change below
test4.to_csv("1003392536_1003392531_e5d4f4210c818cab20715d61.csv")

输出数据帧看起来像-

enter image description here

但是,from 和 to 时间戳没有任何意义,因为它们错误显示在单词的开头以及同一个单词的结尾。而不是整个句子连续结束。可以做些什么来改变我的脚本?

您可以使用任何 wav 格式的音频文件来测试我的代码。我是新来的,不知道如何上传我的 wav 文件

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。