如何使用 Java API 从 Dialogflow CX 检测意图并以音频和文本形式获取代理响应？

如何解决如何使用 Java API 从 Dialogflow CX 检测意图并以音频和文本形式获取代理响应？

我正在尝试使用 Java API 和 DialogFlow CX 开发一个简单的语音机器人。

这些是我在 Spring Boot 2.4.3 项目中的依赖项

...
        <dependency>
            <groupId>com.google.cloud</groupId>
            <artifactId>spring-cloud-gcp-starter</artifactId>
        </dependency>
        <dependency>
            <groupId>com.google.cloud</groupId>
            <artifactId>google-cloud-dialogflow-cx</artifactId>
            <version>0.6.1</version>
        </dependency>
...

我使用 https://github.com/googleapis/java-dialogflow-cx 作为起点，到目前为止一切似乎都运行良好......除了最重要的事情。

当我向代理发送文本或事件时，它会检测到意图并得到响应，但没有音频输出。因此，似乎没有执行 Text to Speech。

我在文档示例中执行请求的方式：

QueryInput queryInput = QueryInput
                .newBuilder()
                .setLanguageCode("es-ES")
                .setText("hola")
                .build();

DetectIntentRequest request = DetectIntentRequest.newBuilder()
                .setSession(sessionName.toString())
                .setQueryInput(queryInput)
                .build();

DetectIntentResponse response = sessionsClient.detectIntent(request);

回复：

{
  "detectIntentResponse": {
    "text": "hola","languageCode": "es","responseMessages": [
      {
        "text": {
          "text": [
            "¡Buenos días!"
          ]
        }
      },{
        
      }
    ],"currentPage": {
      "name": "projects/test-project/locations/global/agents/9effb8aa-6b62-4fe6-9fd5-2f5e87265ee7/flows/00000000-0000-0000-0000-000000000000/pages/START_PAGE","displayName": "Start Page"
    },"intent": {
      "name": "projects/test-project/locations/global/agents/9effb8aa-6b62-4fe6-9fd5-2f5e87265ee7/intents/00000000-0000-0000-0000-000000000000","displayName": "Default Welcome Intent"
    },"intentDetectionConfidence": 1.0,"diagnosticInfo": {
      "Execution Sequence": [
        {
          "Step 1": {
            "InitialState": {
              "FlowState": {
                "Version": 0.0,"PageState": {
                  "Status": "ENTERING_PAGE","Name": "Start Page"
                },"Name": "Default Start Flow"
              },"MatchedIntent": {
                "Score": 1.0,"Type": "NLU","Active": true,"DisplayName": "Default Welcome Intent","Id": "00000000-0000-0000-0000-000000000000"
              }
            },"Type": "INITIAL_STATE"
          }
        },{
          "Step 2": {
            "Type": "STATE_MACHINE","StateMachine": {
              "FlowState": {
                "Version": 0.0,"Name": "Default Start Flow","PageState": {
                  "Name": "Start Page","Status": "TRANSITION_ROUTING"
                }
              },"TriggeredIntent": "Default Welcome Intent"
            }
          }
        },{
          "Step 3": {
            "FunctionExecution": {
              "Responses": [
                {
                  "text": {
                    "redactedText": [
                      "¡Buenos días!"
                    ],"text": [
                      "¡Buenos días!"
                    ]
                  },"responseType": "HANDLER_PROMPT","source": "VIRTUAL_AGENT"
                }
              ]
            },"Type": "FUNCTION_EXECUTION"
          }
        },{
          "Step 4": {
            "Type": "STATE_MACHINE","Status": "TRANSITION_ROUTING"
                },"Name": "Default Start Flow"
              }
            }
          }
        }
      ],"Alternative Matched Intents": [
        {
          "Active": true,"Id": "00000000-0000-0000-0000-000000000000","Score": 1.0
        }
      ],"Transition Targets Chain": [
        
      ],"Triggered Transition Names": [
        "9db835de-3e94-4a2a-9b8d-4eda03039e5a"
      ]
    },"match": {
      "intent": {
        "name": "projects/test-project/locations/global/agents/9effb8aa-6b62-4fe6-9fd5-2f5e87265ee7/intents/00000000-0000-0000-0000-000000000000","displayName": "Default Welcome Intent"
      },"resolvedInput": "hola","matchType": "INTENT","confidence": 1.0
    }
  }
}

在 DialogFlow ES 中有一个选项可以启用自动文本到语音，因此输出音频包含在 DetectIntentResponse 中，但我在 CX 中看不到任何类似的选项。

我在谷歌上做了几次搜索，但找不到任何有用的东西。

所以问题是：如何使用 Java API 从 Dialogflow CX 检测意图并获取代理响应作为音频和文本？

示例代码应该很棒！

先谢谢你！

解决方法

根据documentation

"如果客户端想要接收音频响应，它也应该包含 output_audio_config。”

即使我没有使用 SteamingDetectIntent，为了在响应中接收音频，也必须添加“OutputAudioConfig”。

代码应该是这样的：

DetectIntentRequest request = DetectIntentRequest.newBuilder()
                .setSession(sessionName.toString())
                .setQueryInput(queryInput)
                .setAudioEncoding(
                   OutputAudioEncoding.OUTPUT_AUDIO_ENCODING_MP3)
                        .build())
                .build();

DetectIntentResponse response = sessionsClient.detectIntent(request);

响应还将包含我正在寻找的 outputAudio。

{
  "outputAudio": "//NExAAAAANIAAAAALYwEAA......THE AUDIO ...ngAUYYAP/","outputAudioConfig": {
    "audioEncoding": "OUTPUT_AUDIO_ENCODING_MP3"
  }
}

我希望它对某人有用。

谢谢！

如何使用 Java API 从 Dialogflow CX 检测意图并以音频和文本形式获取代理响应？

如何解决如何使用 Java API 从 Dialogflow CX 检测意图并以音频和文本形式获取代理响应？

解决方法

相关推荐