如何解决如何在使用 Azure 认知语音翻译 API 时使用 python sdk 识别说话者?
我正在尝试使用 azure 文档中提供的基于修改的基于事件的合成代码示例进行语音到语音翻译。但是,在此过程中,我也想识别发言者(发言者 1、发言者 2),但我在 Python SDK 中没有看到可以帮助我将发言者识别为语音=文本翻译的一部分的功能。有人可以建议在语音到文本翻译过程中识别说话者的方法吗?下面是代码片段:
def translate_speech_to_text():
translation_config = speechsdk.translation.SpeechTranslationConfig(subscription=speech_key,region=service_region)
translation_config.speech_recognition_language = from_language
translation_config.add_target_language(to_language)
translation_config.voice_name = "en-GB-Susan"
translation_config.request_word_level_timestamps()
translation_config.output_format = speechsdk.OutputFormat(0)
audio_input = speechsdk.AudioConfig(filename=filename)
recognizer = speechsdk.translation.TranslationRecognizer(translation_config = translation_config,audio_config = audio_input)
done = False
def stop_cb(evt):
"""callback that stops continuous recognition upon receiving an event `evt`"""
#print('CLOSING on {}'.format(evt))
recognizer.stop_continuous_recognition()
nonlocal done
done = True
all_results = []
def handle_final_result(evt):
#all_results.append(evt.result.text)
#all_results.append(evt.result.translations['en'])
all_results.append(evt.result.json)
recognizer.recognized.connect(handle_final_result)
# Connect callbacks to the events fired by the speech recognizer
recognizer.recognizing.connect(lambda evt: print('RECOGNIZING: {}'.format(evt)))
recognizer.recognized.connect(lambda evt: print('RECOGNIZED: {}'.format(evt)))
recognizer.session_started.connect(lambda evt: print('SESSION STARTED: {}'.format(evt)))
recognizer.session_stopped.connect(lambda evt: print('SESSION STOPPED {}'.format(evt)))
#recognizer.canceled.connect(lambda evt: print('CANCELED {}'.format(evt)))
recognizer.session_stopped.connect(stop_cb)
recognizer.canceled.connect(stop_cb)
def synthesis_callback(evt):
print('Audio: {}'.format(len(evt.result.audio)))
print('Reason: {}'.format(evt.result.reason))
with open('out.wav','wb') as wavfile:
wavfile.write(evt.result.audio)
recognizer.synthesizing.connect(synthesis_callback)
recognizer.start_continuous_recognition()
while not done:
time.sleep(.5)
print("Printing all results:")
print(all_results)
translate_speech_to_text()
解决方法
如果你想识别说话者,你应该使用Speech Service
。
推荐使用 REST API。
Text Independent - Identify Single Speaker
Speech Services
在C#
、C++
、JavaScript
、REST
中有完整的SDK,可以执行Speaker Recognition
。 (我搜索了Python SDK
,没有找到可以直接识别的方法。)
建议
1. It is recommended to read the Speech related documents carefully and how to use this service.
2. It is recommended to use request to send http post requests.
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。