微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

Hudi:在嵌入式模式下访问时间服务器超时

如何解决Hudi:在嵌入式模式下访问时间服务器超时

我正在测试 Hudi 0.5.3(由 AWS Athena 支持),方法是在嵌入式模式下使用 Spark(即单元测试)运行它。一开始测试成功,现在访问Hudi的时间服务器超时失败。

以下内容基于 Hudi: Getting Started 指南。

Spark 会话设置:

private val spark = addSparkConfigs(SparkSession.builder()
    .appName("spark testing")
    .master("local"))
    .config("spark.serializer","org.apache.spark.serializer.KryoSerializer")
    .config("spark.ui.port","4041")
    .enableHiveSupport()
    .getorCreate()

导致超时异常的代码

    val inserts = convertToStringList(dataGen.generateInserts(10))
    var df = spark.read.json(spark.sparkContext.parallelize(inserts,2))
    df.write.format("hudi").
      options(getQuickstartWriteConfigs).
      option(PRECOMBINE_FIELD_OPT_KEY,"ts").
      option(RECORDKEY_FIELD_OPT_KEY,"uuid").
      option(PARTITIONPATH_FIELD_OPT_KEY,"partitionpath").
      option(TABLE_NAME,tableName).
      mode(Overwrite).
      save(basePath)

超时和异常抛出:

170762 [Executor task launch worker for task 47] INFO  org.apache.hudi.common.table.view.FileSystemViewManager  - Creating remote view for basePath /var/folders/z9/_9mf84p97hz1n45b0gnpxlj40000gp/T/HudiQuickStartSpec-hudi_trips_cow2193648737745630661. Server=xxx:59520
170766 [Executor task launch worker for task 47] INFO  org.apache.hudi.common.table.view.FileSystemViewManager  - Creating InMemory based view for basePath /var/folders/z9/_9mf84p97hz1n45b0gnpxlj40000gp/T/HudiQuickStartSpec-hudi_trips_cow2193648737745630661
170769 [Executor task launch worker for task 47] INFO  org.apache.hudi.common.table.view.RemoteHoodieTableFileSystemView  - Sending request : (http://xxx:59520/v1/hoodie/view/datafiles/beforeoron/latest/?partition=americas%2Funited_states%2Fsan_francisco&maxinstant=20201221180946&basepath=%2Fvar%2Ffolders%2Fz9%2F_9mf84p97hz1n45b0gnpxlj40000gp%2FT%2FHudiQuickStartSpec-hudi_trips_cow2193648737745630661&lastinstantts=20201221180946&timelinehash=70f7aa073fa3d86033278a59cbda71c6488f4883570d826663ebb51934a25abf)
246649 [Executor task launch worker for task 47] ERROR org.apache.hudi.common.table.view.PriorityBasedFileSystemView  - Got error running preferred function. Trying secondary
org.apache.hudi.exception.Hoodieremoteexception: Connect to xxx:59520 [/xxx] Failed: Operation timed out (Connection timed out)
    at org.apache.hudi.common.table.view.RemoteHoodieTableFileSystemView.getLatestBaseFilesFromParams(RemoteHoodieTableFileSystemView.java:223)
    at org.apache.hudi.common.table.view.RemoteHoodieTableFileSystemView.getLatestBaseFilesBeforeOrOn(RemoteHoodieTableFileSystemView.java:230)
    at org.apache.hudi.common.table.view.PriorityBasedFileSystemView.execute(PriorityBasedFileSystemView.java:97)
    at org.apache.hudi.common.table.view.PriorityBasedFileSystemView.getLatestBaseFilesBeforeOrOn(PriorityBasedFileSystemView.java:134)
    at org.apache.hudi.index.bloom.HoodieBloomIndex.lambda$loadInvolvedFiles$19c2c1bb$1(HoodieBloomIndex.java:201)
    at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$1$1.apply(JavaRDDLike.scala:125)

我无法为 Hudi 时间服务器端口尝试不同的端口设置,因为我找不到控制端口的配置设置。

任何想法为什么访问时间服务器超时?

解决方法

问题出在Hudi解决spark driver host的方式上。看起来,虽然它启动并将其 Web 服务器绑定到 localhost,但 Hudi 的客户端随后使用该 IP 地址调用了它启动的服务器。

5240 [pool-1-thread-1-ScalaTest-running-HudiSimpleCdcSpec] INFO  io.javalin.Javalin  - Starting Javalin ...
5348 [pool-1-thread-1-ScalaTest-running-HudiSimpleCdcSpec] INFO  io.javalin.Javalin  - Listening on http://localhost:59520/
...
org.apache.hudi.exception.HoodieRemoteException: Connect to xxx:59520 [/xxx] failed: Operation timed out (Connection timed out)

解决方案是明确配置“spark.driver.host”设置。以下对我有用:

private val spark = addSparkConfigs(SparkSession.builder()
.appName("spark testing")
.master("local"))
.config("spark.serializer","org.apache.spark.serializer.KryoSerializer")
.config("spark.driver.host","localhost")
.config("spark.ui.port","4041")
.enableHiveSupport()
.getOrCreate()

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。