微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

直接从音频数据流到文本的语音转换,而无需保存到.wav文件中

如何解决直接从音频数据流到文本的语音转换,而无需保存到.wav文件中

我正在使用Pyaudio和speech_recognition模块将音频转换为文本。我的问题是我可以在不创建audio.wav文件的情况下将使用pyaudio录制的音频转换为文本吗?因为当我使用with sr.AudioFile(chunk_name) as source:

打开音频文件时,语音识别模块显然可以处理音频文件

记录代码

def start_recording(duration,chunk_th):
    '''
    This function takes the duration in seconds and chunks threashold then records the sound from the microphone 
    and displays the text in realtime after saving the recordings into the folder using multithreading.
    '''
    audio_format = pyaudio.paInt16
    rate = 44100

    p = pyaudio.PyAudio()
    r = sr.Recognizer()

    stream = p.open(format=audio_format,channels=1,rate=rate,input=True,frames_per_buffer=1024)

    print("*** Recording Started ***")

    frames = []

    chunk_no = 1
    for i in range(1,int(rate / 1024 * duration)+1):
        data = stream.read(1024)
        frames.append(data)
        if i % chunk_th == 0 or i == int(rate / 1024 * duration):
            # start a thread with frames
            fm_copy = frames.copy()
            t = threading.Thread(target=save_chunk,args=(fm_copy,chunk_no,rate,p,r))
            t.start()
            chunk_no += 1
            frames.clear()
#     print(len(frames))
    print("* done recording")

    stream.stop_stream()
    stream.close()
    p.terminate()

这是用于记录的代码,我将间隔后的帧传递到线程中,以便我可以进行记录并实时处理音频。
我在线程中调用函数

def save_chunk(fr,r):
    '''
    This function will take the frames for the each chunk and saves the file in the folder
    then extract the spoken text from the file and displays the text.
    Question : Can i extract the audio from frames directly using same methodology without having to 
    create audio files in the folder?
    '''
#     print("this has recived {} frames".format(len(fr)))
    try:
         
        chunk_name = "Chunks/chunk_{}.wav".format(chunk_no)
        wf = wave.open(chunk_name,'wb')
        wf.setnchannels(1)
        wf.setsampwidth(p.get_sample_size(pyaudio.paInt16))
        wf.setframerate(rate)
        wf.writeframes(b''.join(fr))
        wf.close()
        
        
        with sr.AudioFile(chunk_name) as source:
            audio = r.listen(source)
            text = r.recognize_google(audio)
            print(text)
            
    except Exception as e:
        print("No Audio Detected",e)

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。