微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

如何使用 IBM Speech to Text 进行说话人分类?

如何解决如何使用 IBM Speech to Text 进行说话人分类?

我正在尝试使用 IBM 语音到文本执行说话者分类。我正在通过 API 发送我的音频文件,并且得到的结果为 JSON 格式,如下所示。

{
  "results": [
    {
      "alternatives": [
        {
          "timestamps": [
            [
              "hello",0.68,1.19
            ],[
              "yeah",1.47,1.91
            ],1.96,2.12
            ],[
              "how's",2.12,2.59
            ],[
              "Billy",2.59,3.17
            ],[
              "good",4.01,4.30
            ]
          ]
          "confidence": 0.82,"transcript": "hello yeah yeah how's Billy good "
        }
      ],"final": true
    }
  ],"result_index": 0,"speaker_labels": [
    {
      "from": 0.68,"to": 1.19,"speaker": 2,"confidence": 0.52,"final": false
    },{
      "from": 1.47,"to": 1.93,"speaker": 1,"confidence": 0.62,{
      "from": 1.96,"to": 2.12,"confidence": 0.51,{
      "from": 2.12,"to": 2.59,{
      "from": 2.59,"to": 3.17,{
      "from": 4.01,"to": 4.30,"confidence": 0.63,"final": true
    }
  ]
}

但我想要这种格式 ->

Speaker 2 - "Hello?"
Speaker 1 - "Yeah?"
Speaker 2 - "Yeah,how's Billy?"
Speaker 1 - "Good."

有什么方法可以给我这种格式的结果还是我必须编写自己的代码? 这是我的代码

with open('/content/test.mp3','rb') as audio_file:
    speech_recognition_results = speech_to_text.recognize(
        audio=audio_file,content_type='audio/mp3',word_alternatives_threshold=0.9,speaker_labels = True
    ).get_result()
print(json.dumps(speech_recognition_results,indent=2))

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。