微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

如何在使用 Azure 认知语音翻译 API 时使用 python sdk 识别说话者?

如何解决如何在使用 Azure 认知语音翻译 API 时使用 python sdk 识别说话者?

我正在尝试使用 azure 文档中提供的基于修改的基于事件的合成代码示例进行语音到语音翻译。但是,在此过程中,我也想识别发言者(发言者 1、发言者 2),但我在 Python SDK 中没有看到可以帮助我将发言者识别为语音=文本翻译的一部分的功能。有人可以建议在语音到文本翻译过程中识别说话者的方法吗?下面是代码片段:

def translate_speech_to_text():

    translation_config = speechsdk.translation.SpeechTranslationConfig(subscription=speech_key,region=service_region)
    translation_config.speech_recognition_language = from_language
    translation_config.add_target_language(to_language)
    translation_config.voice_name = "en-GB-Susan"

    translation_config.request_word_level_timestamps()
    translation_config.output_format = speechsdk.OutputFormat(0)

    audio_input = speechsdk.AudioConfig(filename=filename)
    recognizer = speechsdk.translation.TranslationRecognizer(translation_config = translation_config,audio_config = audio_input)

    done = False

    def stop_cb(evt):
        """callback that stops continuous recognition upon receiving an event `evt`"""
        #print('CLOSING on {}'.format(evt))
        recognizer.stop_continuous_recognition()
        nonlocal done
        done = True

    all_results = []
    def handle_final_result(evt):
        #all_results.append(evt.result.text)
        #all_results.append(evt.result.translations['en'])
        all_results.append(evt.result.json)
    
    recognizer.recognized.connect(handle_final_result)
    # Connect callbacks to the events fired by the speech recognizer
    recognizer.recognizing.connect(lambda evt: print('RECOGNIZING: {}'.format(evt)))
    recognizer.recognized.connect(lambda evt: print('RECOGNIZED: {}'.format(evt)))
    recognizer.session_started.connect(lambda evt: print('SESSION STARTED: {}'.format(evt)))
    recognizer.session_stopped.connect(lambda evt: print('SESSION STOPPED {}'.format(evt)))
    #recognizer.canceled.connect(lambda evt: print('CANCELED {}'.format(evt)))
    recognizer.session_stopped.connect(stop_cb)
    recognizer.canceled.connect(stop_cb)
    
    def synthesis_callback(evt):
        print('Audio: {}'.format(len(evt.result.audio)))
        print('Reason: {}'.format(evt.result.reason))
        with open('out.wav','wb') as wavfile:
            wavfile.write(evt.result.audio)
   
    recognizer.synthesizing.connect(synthesis_callback)
    recognizer.start_continuous_recognition()    

    while not done:
        time.sleep(.5)
    
    print("Printing all results:")
    print(all_results)

translate_speech_to_text()

解决方法

如果你想识别说话者,你应该使用Speech Service

推荐使用 REST API。

Text Independent - Identify Single Speaker

Speech ServicesC#C++JavaScriptREST中有完整的SDK,可以执行Speaker Recognition。 (我搜索了Python SDK,没有找到可以直接识别的方法。)

建议

1. It is recommended to read the Speech related documents carefully and how to use this service.

2. It is recommended to use request to send http post requests.

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。