微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

如何在 Databricks 中的 Iceberg 表上执行 Spark SQL 合并语句?

如何解决如何在 Databricks 中的 Iceberg 表上执行 Spark SQL 合并语句?

我正在尝试在我们的 Databricks 环境中设置 Apache Iceberg,但在 Spark sql 中执行 MERGE 语句时遇到错误

代码

CREATE TABLE iceberg.db.table (id bigint,data string) USING iceberg;

INSERT INTO iceberg.db.table VALUES (1,'a'),(2,'b'),(3,'c');

INSERT INTO iceberg.db.table SELECT id,data FROM (select * from iceberg.db.table) t WHERE length(data) = 1;

MERGE INTO iceberg.db.table t USING (SELECT * FROM iceberg.db.table) u ON t.id = u.id
WHEN NOT MATCHED THEN INSERT *

产生这个错误

Error in sql statement: AnalysisException: MERGE destination only supports Delta sources.
Some(RelationV2[id#116L,data#117] iceberg.db.table

我认为问题的根源在于 MERGE 也是 Delta Lake sql 引擎的关键字。据我所知,这个问题源于 Spark 尝试执行计划的顺序。 MERGE 触发增量规则,然后抛出错误,因为它不是增量表。我可以毫无问题地读取、追加和覆盖冰山表。

主要问题:如何让 Spark 将其识别为 Iceberg 查询而不是 Delta?或者是否可以完全删除与增量相关的 sql 规则?

环境

Spark 版本:3.0.1

Databricks 运行时版本:7.6

冰山配置

spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions
spark.sql.catalog.iceberg=org.apache.iceberg.spark.SparkCatalog
spark.sql.catalog.iceberg.type=hadoop
spark.sql.catalog.iceberg.warehouse=BLOB_STORAGE_CONTAINER

堆栈跟踪:

com.databricks.backend.common.rpc.DatabricksExceptions$sqlExecutionException: org.apache.spark.sql.AnalysisException: MERGE destination only supports Delta sources.
Some(RelationV2[id#116L,data#117] iceberg.db.table
);
    at com.databricks.sql.transaction.tahoe.DeltaErrors$.notADeltaSourceException(DeltaErrors.scala:343)
    at com.databricks.sql.transaction.tahoe.PreprocesstableMerge.apply(PreprocesstableMerge.scala:201)
    at com.databricks.sql.transaction.tahoe.PreprocesstableMergeEdge$$anonfun$apply$1.applyOrElse(PreprocesstableMergeEdge.scala:39)
    at com.databricks.sql.transaction.tahoe.PreprocesstableMergeEdge$$anonfun$apply$1.applyOrElse(PreprocesstableMergeEdge.scala:36)
    at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsDown$2(AnalysisHelper.scala:112)
    at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82)
    at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsDown$1(AnalysisHelper.scala:112)
    at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:216)
    at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsDown(AnalysisHelper.scala:110)
    at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsDown$(AnalysisHelper.scala:108)
    at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsDown(LogicalPlan.scala:29)
    at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperators(AnalysisHelper.scala:73)
    at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperators$(AnalysisHelper.scala:72)
    at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperators(LogicalPlan.scala:29)
    at com.databricks.sql.transaction.tahoe.PreprocesstableMergeEdge.apply(PreprocesstableMergeEdge.scala:36)
    at com.databricks.sql.transaction.tahoe.PreprocesstableMergeEdge.apply(PreprocesstableMergeEdge.scala:29)
    at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$2(RuleExecutor.scala:152)```

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。