如何解决将数据从 pyspark 保存到 HBase 时出错
我正在尝试使用 PySpark 将 Spark Dataframe 写入 HBase。我上传了 spark HBase 依赖项。通过使用 Jupyter notebook,我正在运行代码。 另外,我在 HBase 中的默认命名空间中创建了一个表。
我通过运行以下命令启动了 pyspark。 我的火花版本:火花3.x 和 HBase 版本:hbase-2.2.6
pyspark --packages com.hortonworks:shc:1.0.0-1.6-s_2.10 --repositories http://repo.hortonworks.com/content/groups/public/ --files /home/vijee/hbase-2.2.6-bin/conf/hbase-site.xml
依赖添加成功
df = sc.parallelize([('a','def'),('b','abc')]).toDF(schema=['col0','col1'])
catalog = ''.join("""{
"table":{"namespace":"default","name":"smTable"},"rowkey":"c1","columns":{
"col0":{"cf":"rowkey","col":"c1","type":"string"},"col1":{"cf":"t1","col":"c2","type":"string"}
}
}""".split())
df.write.options(catalog=catalog).format('org.apache.spark.sql.execution.datasources.hbase').save()
当我运行上述语句时,出现以下错误。由于我是 Spark 新手,我无法理解错误。
起初,我尝试使用我的 CSV 文件并面临相同的“:java.lang.AbstractMethodError”。现在我使用的示例数据仍然出现相同的错误。
Py4JJavaError Traceback (most recent call last)
<ipython-input-9-cfcf107b1f03> in <module>
----> 1 df.write.options(catalog=catalog,newtable=5).format('org.apache.spark.sql.execution.datasources.hbase').save()
~/spark-3.0.1-bin-hadoop2.7/python/pyspark/sql/readwriter.py in save(self,path,format,mode,partitionBy,**options)
823 self.format(format)
824 if path is None:
--> 825 self._jwrite.save()
826 else:
827 self._jwrite.save(path)
~/spark-3.0.1-bin-hadoop2.7/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py in __call__(self,*args)
1302
1303 answer = self.gateway_client.send_command(command)
-> 1304 return_value = get_return_value(
1305 answer,self.gateway_client,self.target_id,self.name)
1306
~/spark-3.0.1-bin-hadoop2.7/python/pyspark/sql/utils.py in deco(*a,**kw)
126 def deco(*a,**kw):
127 try:
--> 128 return f(*a,**kw)
129 except py4j.protocol.Py4JJavaError as e:
130 converted = convert_exception(e.java_exception)
~/spark-3.0.1-bin-hadoop2.7/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py in get_return_value(answer,gateway_client,target_id,name)
324 value = OUTPUT_CONVERTER[type](answer[2:],gateway_client)
325 if answer[1] == REFERENCE_TYPE:
--> 326 raise Py4JJavaError(
327 "An error occurred while calling {0}{1}{2}.\n".
328 format(target_id,".",name),value)
Py4JJavaError: An error occurred while calling o114.save.
: java.lang.AbstractMethodError: org.apache.spark.sql.execution.datasources.hbase.DefaultSource.createRelation(Lorg/apache/spark/sql/SQLContext;Lorg/apache/spark/sql/SaveMode;Lscala/collection/immutable/Map;Lorg/apache/spark/sql/Dataset;)Lorg/apache/spark/sql/sources/BaseRelation;
at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:46)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。