如何解决当我尝试导入.parquet文件时,为什么R给我“错误:org.apache.spark.sql.catalyst.errors.package $ TreeNodeException:执行,树:”?
我需要管理一个传感器在5个月内检测到的数据,我要处理30根色谱柱,需要5秒的采样时间,所以检测量相当大。
这些数据已发送到包含4个“ .parquet”文件的文件夹中。
现在我需要将它们导入Rstudio中以进行一些分析。
到目前为止,我仅使用 sparklyr 界面安装了本地Spark集群。 这是我运行的代码:
sc <- spark_connect(master = "local",version = "3.0")
parquet_dir <- ("C:/Users/Utente/Desktop/Tesi/DatiP")
spark_read_parquet(sc,name = "dati",parquet_dir)
问题是当我尝试加载这些数据时,出现了问题,我收到了此错误:
Errore: org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute,tree:
Exchange SinglePartition,true,[id=#3117]
+- *(1) HashAggregate(keys=[],functions=[partial_count(1)],output=[count#15332L])
+- Scan In-memory table `dati`
+- InMemoryRelation [Time#14013,Wind Sensor Direction#14014,Wind Sensor Speed#14015,Air temp#14016,Air humidity#14017,Air pressure#14018,Rain amount#14019,Rain duration#14020,Rain intensity#14021,Hail amount#14022,Hail duration#14023,Hail intensity#14024,Thermoelement 1#14025,Thermoelement 2#14026,Thermoelement 3#14027,Thermoelement 4#14028,Thermoelement 17#14029,Thermoelement 18#14030,Thermoelement 19#14031,Thermoelement 20#14032,Thermoelement 21#14033,Thermoelement 22#14034,Thermoelement 23#14035,Thermoelement 24#14036,... 27 more fields],StorageLevel(disk,memory,deserialized,1 replicas)
+- *(1) ColumnarToRow
+- FileScan parquet [Time#14013,... 27 more fields] Batched: true,DataFilters: [],Format: Parquet,Location: InMemoryFileIndex[file:/C:/Users/Utente/Desktop/Tesi/DatiP],PartitionFilters: [],PushedFilters: [],ReadSchema: struct<Time:timestamp,Wind Sensor Direction:double,Wind Sensor Speed:double,Air temp:double,Air h...
at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:56)
at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec.doExecute(ShuffleExchangeExec.scala:95)
at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:175)
at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:213)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:210)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:171)
at org.apache.spark.sql.execution.InputAdapter.inputRDD(WholeStageCodegenExec.scala:525)
at org.apache.spark.sql.execution.InputRDDCodegen.inputRDDs(WholeStageCodegenExec.scala:453)
at org.apache.spark.sql.execution.InputRDDCodegen.inputRDDs$(WholeStageCodegenExec.scala:452)
at org.apache.spark.sql.execution.InputAdapter.inputRDDs(WholeStageCodegenExec.scala:496)
at org.apache.spark.sql.execution.aggregate.HashAggregateExec.inputRDDs(HashAggregateExec.scala:162)
at org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:720)
at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:175)
at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:213)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:210)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:171)
at org.apache.spark.sql.execution.SparkPlan.getByteArrayRdd(SparkPlan.scala:316)
at org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:382)
at org.apache.spark.sql.Dataset.$anonfun$count$1(Dataset.scala:2979)
at org.apache.spark.sql.Dataset.$anonfun$count$1$adapted(Dataset.scala:2978)
at org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3616)
at org.apache.spark.sql.execution.sqlExecution$.$anonfun$withNewExecutionId$5(sqlExecution.scala:100)
at org.apache.spark.sql.execution.sqlExecution$.withsqlConfPropagated(sqlExecution.scala:160)
at org.apache.spark.sql.execution.sqlExecution$.$anonfun$withNewExecutionId$1(sqlExecution.scala:87)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:763)
at org.apache.spark.sql.execution.sqlExecution$.withNewExecutionId(sqlExecution.scala:64)
at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3614)
at org.apache.spark.sql.Dataset.count(Dataset.scala:2978)
at org.apache.spark.sql.execution.command.CacheTableCommand.run(cache.scala:62)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79)
at org.apache.spark.sql.Dataset.$anonfun$logicalPlan$1(Dataset.scala:229)
at org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3616)
at org.apache.spark.sql.execution.sqlExecution$.$anonfun$withNewExecutionId$5(sqlExecution.scala:100)
at org.apache.spark.sql.execution.sqlExecution$.withsqlConfPropagated(sqlExecution.scala:160)
at org.apache.spark.sql.execution.sqlExecution$.$anonfun$withNewExecutionId$1(sqlExecution.scala:87)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:763)
at org.apache.spark.sql.execution.sqlExecution$.withNewExecutionId(sqlExecution.scala:64)
at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3614)
at org.apache.spark.sql.Dataset.<init>(Dataset.scala:229)
at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:100)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:763)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:97)
at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:606)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:763)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:601)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(UnkNown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(UnkNown Source)
at java.lang.reflect.Method.invoke(UnkNown Source)
at sparklyr.Invoke.invoke(invoke.scala:147)
at sparklyr.StreamHandler.handleMethodCall(stream.scala:136)
at sparklyr.StreamHandler.read(stream.scala:61)
at sparklyr.BackendHandler.$anonfun$channelRead0$1(handler.scala:58)
at scala.util.control.Breaks.breakable(Breaks.scala:42)
at sparklyr.BackendHandler.channelRead0(handler.scala:39)
at sparklyr.BackendHandler.channelRead0(handler.scala:14)
at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:99)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
at io.netty.handler.codec.MessagetoMessageDecoder.channelRead(MessagetoMessageDecoder.java:102)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
at io.netty.handler.codec.BytetoMessageDecoder.fireChannelRead(BytetoMessageDecoder.java:321)
at io.netty.handler.codec.BytetoMessageDecoder.channelRead(BytetoMessageDecoder.java:295)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at
我已经阅读了有关以下内容的几个答案:
org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute,tree:
错误;但是我注意到,要求澄清这种错误的人正在试图行关节。不是我的情况,我只是在尝试导入数据。
问题是否可能与我打算上传的数据结构直接相关?
我还尝试使用在GitHub上找到的“。parquet文件查看器” 来查看数据,但似乎没有任何异常。
预先感谢您的帮助。
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。