Azure 语音识别 - 使用二进制/十六进制数据而不是 WAV 文件路径

如何解决Azure 语音识别 - 使用二进制/十六进制数据而不是 WAV 文件路径

我正在寻找一种使用 Azure 语音识别 API 的方法，将二进制/十六进制数据而不是 WAV 文件路径作为参数传递。

“raw_data”是一个十六进制数据，代表一个小的WAV文件：

raw_data = self.audio.get_wav_data()

保存到磁盘中的 WAV 文件（这不是我要找的）：

main_dir = os.path.dirname(__file__)
wav_file = os.path.join(main_dir,'output.wav')
with open(wav_file,'wb') as f:
    f.write(raw_data)

设置配置并使用 Azure 的 API 进行语音识别：

speech_config = speechsdk.SpeechConfig(subscription="<subscription>",region="westeurope")
speech_config.speech_recognition_language="pt-BR"
audio_config = speechsdk.AudioConfig(filename=wav_file)
speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config,audio_config=audio_config)
result = speech_recognizer.recognize_once_async().get()
user_request = result.text

解决方法

您可以使用推送流

示例如下：

https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/master/samples/python/console/speech_sample.py

在这里配置将用于识别的流。

 stream = speechsdk.audio.PushAudioInputStream()
 audio_config = speechsdk.audio.AudioConfig(stream=stream)

示例是指读取波形文件并转换为二进制数据。相反，您可以使用 stream.read() 方法并在上面配置的 Stream 中加载二进制数据

frames = wav_fh.readframes(n_bytes // 2)
        print('read {} bytes'.format(len(frames)))
        if not frames:
            break

    stream.write(frames)