如何解决如何使用 IBM Speech to Text 进行说话人分类?
我正在尝试使用 IBM 语音到文本执行说话者分类。我正在通过 API 发送我的音频文件,并且得到的结果为 JSON 格式,如下所示。
{
"results": [
{
"alternatives": [
{
"timestamps": [
[
"hello",0.68,1.19
],[
"yeah",1.47,1.91
],1.96,2.12
],[
"how's",2.12,2.59
],[
"Billy",2.59,3.17
],[
"good",4.01,4.30
]
]
"confidence": 0.82,"transcript": "hello yeah yeah how's Billy good "
}
],"final": true
}
],"result_index": 0,"speaker_labels": [
{
"from": 0.68,"to": 1.19,"speaker": 2,"confidence": 0.52,"final": false
},{
"from": 1.47,"to": 1.93,"speaker": 1,"confidence": 0.62,{
"from": 1.96,"to": 2.12,"confidence": 0.51,{
"from": 2.12,"to": 2.59,{
"from": 2.59,"to": 3.17,{
"from": 4.01,"to": 4.30,"confidence": 0.63,"final": true
}
]
}
但我想要这种格式 ->
Speaker 2 - "Hello?"
Speaker 1 - "Yeah?"
Speaker 2 - "Yeah,how's Billy?"
Speaker 1 - "Good."
有什么方法可以给我这种格式的结果还是我必须编写自己的代码? 这是我的代码:
with open('/content/test.mp3','rb') as audio_file:
speech_recognition_results = speech_to_text.recognize(
audio=audio_file,content_type='audio/mp3',word_alternatives_threshold=0.9,speaker_labels = True
).get_result()
print(json.dumps(speech_recognition_results,indent=2))
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。