Google Speech to Text语音识别仅识别音频的前几秒

如何解决Google Speech to Text语音识别仅识别音频的前几秒

我在 node js 中使用 Google 的 Speech-to-Text API。它返回对前几个单词的识别，然后忽略音频文件的其余部分。任何上传文件的截止点约为 5-7 秒。

我试过synchronous speech recognition for shorter audio files。（使用 MP3 文件的示例如下所示）

    filename = './TEST/test.mp3';

    const client = new speech.SpeechClient();

    //configure the request:
    const config = {
        enableWordTimeOffsets: true,sampleRateHertz: 44100,encoding: 'MP3',languageCode: 'en-US',};
    const audio = {
        content: fs.readFileSync(filename).toString('base64'),};
    const request = {
        config: config,audio: audio,};
    
    // Detects speech in the audio file
    const [response] = await client.recognize(request);

我也试过asynchronous recognition for longer audio files （使用如下所示的 WAV 文件的示例）

filename = './TEST/test.wav';

const client = new speech.SpeechClient();

//configure the request:
const config = {
     enableWordTimeOffsets: true,};
const audio = {
     content: fs.readFileSync(filename).toString('base64'),};
const request = {
     config: config,};

//Do a longRunningRecognize request
const [operation] = await client.longRunningRecognize(request);
const [response] = await operation.promise();

我已经用 WAV 文件和 MP3 尝试过这些实现中的每一个。结果总是完全一样：前 5 秒识别良好，然后什么都没有。

任何帮助将不胜感激！

解决方法

@Ricco D 完全正确，我打印的结果不正确...

当您尝试转录较长的文件时，Google Speech to Text 会根据检测到语音暂停的时间来分解您的转录内容。

您的 response.results[] 数组将包含多个条目，您需要遍历这些条目以打印完整的成绩单。

有关更多详细信息，请参阅文档： https://cloud.google.com/speech-to-text/docs/basics#responses