微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

Py4JJavaError:在 Zeppllin 上调用 o158.save 时发生错误

如何解决Py4JJavaError:在 Zeppllin 上调用 o158.save 时发生错误

您好,我的 EMR zepplin 笔记本使用 pyspark 写入 Hudi 数据时出现以下错误。我阅读了多个 stackoverflow 帖子,并按照说明我尝试了 libfb303-0.9.3.jar,libthrift-0.9.3.jarlibfb303-0.9.2.jar,libthrift-0.9.2.jar 独立但仍然低于错误。我正在关注 this AWS Hudi example 并将 apache hudi jar 添加到集群中,如其中所述。

%pyspark
inputDF = spark.createDataFrame(
    [
        ("100","2015-01-01","2015-01-01T13:51:39.340396Z"),("101","2015-01-01T12:14:58.597216Z"),("102","2015-01-01T13:51:40.417052Z"),("103","2015-01-01T13:51:40.519832Z"),("104","2015-01-02","2015-01-01T12:15:00.512679Z"),("105","2015-01-01T13:51:42.248818Z"),],["id","creation_date","last_update_time"]
)

我的会话中的外部火花依赖:

%pyspark
from pyspark.sql import SparkSession
import calendar

app_name = "soundbar"
spark = SparkSession.builder.appName(app_name).getorCreate()
sc = spark.sparkContext

print(spark.sparkContext._jsc.sc().listJars())


ArrayBuffer(spark://ip-172-30-5-107.ec2.internal:38487/jars/libfb303-0.9.3.jar,spark://ip-172-30-5-107.ec2.internal:38487/jars/spark-interpreter-0.8.1.jar,spark://ip-172-30-5-107.ec2.internal:38487/jars/spark-avro_2.11-2.4.7.jar,spark://ip-172-30-5-107.ec2.internal:38487/jars/libthrift-0.9.3.jar,spark://ip-172-30-5-107.ec2.internal:38487/jars/hudi-spark-bundle_2.11-0.6.0.jar,spark://ip-172-30-5-107.ec2.internal:38487/jars/hudi-hive-sync-bundle-0.6.0.jar)

%pyspark
from pyspark.sql import SparkSession
from pyspark.sql.functions import *
from pyspark.sql.types import *
import os
import re

inputDF.write.format('hudi').option('hoodie.datasource.write.operation','insert').options(**hudioptions).mode('overwrite').save('s3://vd-dev-smarttvdata-di2-testing-virginia/anup-di2-dev/data/hudi/')
Py4JJavaError: An error occurred while calling o158.save.
: java.lang.NoSuchMethodError: com.facebook.fb303.FacebookService$Client.sendBaSEOneway(Ljava/lang/String;Lorg/apache/thrift/TBase;)V
    at com.facebook.fb303.FacebookService$Client.send_shutdown(FacebookService.java:436)
    at com.facebook.fb303.FacebookService$Client.shutdown(FacebookService.java:430)
    at org.apache.hadoop.hive.metastore.HivemetastoreClient.close(HivemetastoreClient.java:492)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.hadoop.hive.metastore.retryingmetastoreClient.invoke(retryingmetastoreClient.java:152)
    at com.sun.proxy.$Proxy86.close(UnkNown Source)
    at org.apache.hadoop.hive.ql.Metadata.Hive.close(Hive.java:292)
    at org.apache.hadoop.hive.ql.Metadata.Hive.access$000(Hive.java:138)
    at org.apache.hadoop.hive.ql.Metadata.Hive$1.remove(Hive.java:158)
    at org.apache.hadoop.hive.ql.Metadata.Hive.closeCurrent(Hive.java:262)
    at org.apache.hadoop.hive.ql.Metadata.Hive.get(Hive.java:232)
    at org.apache.hadoop.hive.ql.Metadata.Hive.get(Hive.java:209)
    at org.apache.hudi.hive.HoodieHiveClient.<init>(HoodieHiveClient.java:98)
    at org.apache.hudi.hive.HiveSyncTool.<init>(HiveSyncTool.java:66)
    at org.apache.hudi.HoodieSparksqlWriter$.org$apache$hudi$HoodieSparksqlWriter$$syncHive(HoodieSparksqlWriter.scala:321)
    at org.apache.hudi.HoodieSparksqlWriter$$anonfun$MetaSync$2.apply(HoodieSparksqlWriter.scala:363)
    at org.apache.hudi.HoodieSparksqlWriter$$anonfun$MetaSync$2.apply(HoodieSparksqlWriter.scala:359)
    at scala.collection.mutable.HashSet.foreach(HashSet.scala:78)
    at org.apache.hudi.HoodieSparksqlWriter$.MetaSync(HoodieSparksqlWriter.scala:359)
    at org.apache.hudi.HoodieSparksqlWriter$.commitAndPerformPostOperations(HoodieSparksqlWriter.scala:417)
    at org.apache.hudi.HoodieSparksqlWriter$.write(HoodieSparksqlWriter.scala:205)
    at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:125)
    at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
    at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
    at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
    at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
    at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
    at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
    at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:156)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
    at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
    at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
    at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
    at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80)
    at org.apache.spark.sql.DataFrameWriter$$anonfun$runcommand$1.apply(DataFrameWriter.scala:676)
    at org.apache.spark.sql.DataFrameWriter$$anonfun$runcommand$1.apply(DataFrameWriter.scala:676)
    at org.apache.spark.sql.execution.sqlExecution$$anonfun$withNewExecutionId$1.apply(sqlExecution.scala:78)
    at org.apache.spark.sql.execution.sqlExecution$.withsqlConfPropagated(sqlExecution.scala:125)
    at org.apache.spark.sql.execution.sqlExecution$.withNewExecutionId(sqlExecution.scala:73)
    at org.apache.spark.sql.DataFrameWriter.runcommand(DataFrameWriter.scala:676)
    at org.apache.spark.sql.DataFrameWriter.savetoV1Source(DataFrameWriter.scala:285)
    at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271)
    at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:229)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
    at py4j.Gateway.invoke(Gateway.java:282)
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
    at py4j.commands.CallCommand.execute(CallCommand.java:79)
    at py4j.GatewayConnection.run(GatewayConnection.java:238)
    at java.lang.Thread.run(Thread.java:748)

(<class 'py4j.protocol.Py4JJavaError'>,Py4JJavaError(u'An error occurred while calling o158.save.\n',JavaObject id=o159),<traceback object at 0x7fd5cb633830>)

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。