微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

当我尝试导入.parquet文件时,为什么R给我“错误:org.apache.spark.sql.catalyst.errors.package $ TreeNodeException:执行,树:”?

如何解决当我尝试导入.parquet文件时,为什么R给我“错误:org.apache.spark.sql.catalyst.errors.package $ TreeNodeException:执行,树:”?

我需要管理一个传感器在5个月内检测到的数据,我要处理30根色谱柱,需要5秒的采样时间,所以检测量相当大。

这些数据已发送到包含4个“ .parquet”文件文件夹中。

现在我需要将它们导入Rstudio中以进行一些分析。

到目前为止,我仅使用 sparklyr 界面安装了本地Spark集群。 这是我运行的代码

sc <- spark_connect(master = "local",version = "3.0")
parquet_dir <- ("C:/Users/Utente/Desktop/Tesi/DatiP")
spark_read_parquet(sc,name = "dati",parquet_dir)

问题是当我尝试加载这些数据时,出现了问题,我收到了此错误

Errore: org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute,tree:
Exchange SinglePartition,true,[id=#3117]
+- *(1) HashAggregate(keys=[],functions=[partial_count(1)],output=[count#15332L])
   +- Scan In-memory table `dati`
         +- InMemoryRelation [Time#14013,Wind Sensor Direction#14014,Wind Sensor Speed#14015,Air temp#14016,Air humidity#14017,Air pressure#14018,Rain amount#14019,Rain duration#14020,Rain intensity#14021,Hail amount#14022,Hail duration#14023,Hail intensity#14024,Thermoelement 1#14025,Thermoelement 2#14026,Thermoelement 3#14027,Thermoelement 4#14028,Thermoelement 17#14029,Thermoelement 18#14030,Thermoelement 19#14031,Thermoelement 20#14032,Thermoelement 21#14033,Thermoelement 22#14034,Thermoelement 23#14035,Thermoelement 24#14036,... 27 more fields],StorageLevel(disk,memory,deserialized,1 replicas)
               +- *(1) ColumnarToRow
                  +- FileScan parquet [Time#14013,... 27 more fields] Batched: true,DataFilters: [],Format: Parquet,Location: InMemoryFileIndex[file:/C:/Users/Utente/Desktop/Tesi/DatiP],PartitionFilters: [],PushedFilters: [],ReadSchema: struct<Time:timestamp,Wind Sensor Direction:double,Wind Sensor Speed:double,Air temp:double,Air h...

    at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:56)
    at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec.doExecute(ShuffleExchangeExec.scala:95)
    at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:175)
    at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:213)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
    at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:210)
    at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:171)
    at org.apache.spark.sql.execution.InputAdapter.inputRDD(WholeStageCodegenExec.scala:525)
    at org.apache.spark.sql.execution.InputRDDCodegen.inputRDDs(WholeStageCodegenExec.scala:453)
    at org.apache.spark.sql.execution.InputRDDCodegen.inputRDDs$(WholeStageCodegenExec.scala:452)
    at org.apache.spark.sql.execution.InputAdapter.inputRDDs(WholeStageCodegenExec.scala:496)
    at org.apache.spark.sql.execution.aggregate.HashAggregateExec.inputRDDs(HashAggregateExec.scala:162)
    at org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:720)
    at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:175)
    at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:213)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
    at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:210)
    at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:171)
    at org.apache.spark.sql.execution.SparkPlan.getByteArrayRdd(SparkPlan.scala:316)
    at org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:382)
    at org.apache.spark.sql.Dataset.$anonfun$count$1(Dataset.scala:2979)
    at org.apache.spark.sql.Dataset.$anonfun$count$1$adapted(Dataset.scala:2978)
    at org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3616)
    at org.apache.spark.sql.execution.sqlExecution$.$anonfun$withNewExecutionId$5(sqlExecution.scala:100)
    at org.apache.spark.sql.execution.sqlExecution$.withsqlConfPropagated(sqlExecution.scala:160)
    at org.apache.spark.sql.execution.sqlExecution$.$anonfun$withNewExecutionId$1(sqlExecution.scala:87)
    at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:763)
    at org.apache.spark.sql.execution.sqlExecution$.withNewExecutionId(sqlExecution.scala:64)
    at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3614)
    at org.apache.spark.sql.Dataset.count(Dataset.scala:2978)
    at org.apache.spark.sql.execution.command.CacheTableCommand.run(cache.scala:62)
    at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
    at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
    at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79)
    at org.apache.spark.sql.Dataset.$anonfun$logicalPlan$1(Dataset.scala:229)
    at org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3616)
    at org.apache.spark.sql.execution.sqlExecution$.$anonfun$withNewExecutionId$5(sqlExecution.scala:100)
    at org.apache.spark.sql.execution.sqlExecution$.withsqlConfPropagated(sqlExecution.scala:160)
    at org.apache.spark.sql.execution.sqlExecution$.$anonfun$withNewExecutionId$1(sqlExecution.scala:87)
    at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:763)
    at org.apache.spark.sql.execution.sqlExecution$.withNewExecutionId(sqlExecution.scala:64)
    at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3614)
    at org.apache.spark.sql.Dataset.<init>(Dataset.scala:229)
    at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:100)
    at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:763)
    at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:97)
    at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:606)
    at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:763)
    at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:601)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(UnkNown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(UnkNown Source)
    at java.lang.reflect.Method.invoke(UnkNown Source)
    at sparklyr.Invoke.invoke(invoke.scala:147)
    at sparklyr.StreamHandler.handleMethodCall(stream.scala:136)
    at sparklyr.StreamHandler.read(stream.scala:61)
    at sparklyr.BackendHandler.$anonfun$channelRead0$1(handler.scala:58)
    at scala.util.control.Breaks.breakable(Breaks.scala:42)
    at sparklyr.BackendHandler.channelRead0(handler.scala:39)
    at sparklyr.BackendHandler.channelRead0(handler.scala:14)
    at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:99)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
    at io.netty.handler.codec.MessagetoMessageDecoder.channelRead(MessagetoMessageDecoder.java:102)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
    at io.netty.handler.codec.BytetoMessageDecoder.fireChannelRead(BytetoMessageDecoder.java:321)
    at io.netty.handler.codec.BytetoMessageDecoder.channelRead(BytetoMessageDecoder.java:295)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
    at 

我已经阅读了有关以下内容的几个答案:

org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute,tree:

错误;但是我注意到,要求澄清这种错误的人正在试图行关节。不是我的情况,我只是在尝试导入数据。

问题是否可能与我打算上传的数据结构直接相关?

我还尝试使用在GitHub上找到的“。parquet文件查看器” 来查看数据,但似乎没有任何异常。

预先感谢您的帮助。

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。

相关推荐


Selenium Web驱动程序和Java。元素在(x,y)点处不可单击。其他元素将获得点击?
Python-如何使用点“。” 访问字典成员?
Java 字符串是不可变的。到底是什么意思?
Java中的“ final”关键字如何工作?(我仍然可以修改对象。)
“loop:”在Java代码中。这是什么,为什么要编译?
java.lang.ClassNotFoundException:sun.jdbc.odbc.JdbcOdbcDriver发生异常。为什么?
这是用Java进行XML解析的最佳库。
Java的PriorityQueue的内置迭代器不会以任何特定顺序遍历数据结构。为什么?
如何在Java中聆听按键时移动图像。
Java“Program to an interface”。这是什么意思?