object Test extends App { val master = "spark://localhost.localdomain:8084" val jobName = "scratch" val sparkHome = "/home/shengc/Downloads/software/spark-0.6.1" val executorEnvVars = Map[String,String]( "SPARK_MEM" -> "1g","SPARK_CLAsspATH" -> "","HADOOP_HOME" -> "/home/shengc/Downloads/software/hadoop-0.20.205.0","JAVA_HOME" -> "/usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64","HIVE_HOME" -> "/home/shengc/Downloads/software/hive-0.9.0-bin" ) val sc = new shark.SharkContext(master,jobName,sparkHome,Nil,executorEnvVars) sc.sql2console("create table src") sc.sql2console("load data local inpath '/home/shengc/Downloads/software/hive-0.9.0-bin/examples/files/kv1.txt' into table src") sc.sql2console("select count(1) from src") }
我可以创建表src并将数据加载到src中,但最后一个查询抛出NPE并失败,这是输出…
13/01/06 17:33:20 INFO execution.SparkTask: Executing shark.execution.SparkTask 13/01/06 17:33:20 INFO shark.SharkEnv: Initializing SharkEnv 13/01/06 17:33:20 INFO execution.SparkTask: Adding jar file:///home/shengc/workspace/shark/hive/lib/hive-builtins-0.9.0.jar java.lang.NullPointerException at shark.execution.SparkTask$$anonfun$execute$5.apply(SparkTask.scala:58) at shark.execution.SparkTask$$anonfun$execute$5.apply(SparkTask.scala:55) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:34) at scala.collection.mutable.ArrayOps.foreach(ArrayOps.scala:38) at shark.execution.SparkTask.execute(SparkTask.scala:55) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:134) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1326) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1118) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:951) at shark.SharkContext.sql(SharkContext.scala:58) at shark.SharkContext.sql2console(SharkContext.scala:84) at Test$delayedInit$body.apply(Test.scala:20) at scala.Function0$class.apply$mcV$sp(Function0.scala:34) at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12) at scala.App$$anonfun$main$1.apply(App.scala:60) at scala.App$$anonfun$main$1.apply(App.scala:60) at scala.collection.LinearSeqOptimized$class.foreach(LinearSeqOptimized.scala:59) at scala.collection.immutable.List.foreach(List.scala:76) at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:30) at scala.App$class.main(App.scala:60) at Test$.main(Test.scala:4) at Test.main(Test.scala) Failed: Execution Error,return code -101 from shark.execution.SparkTask13/01/06 17:33:20 ERROR ql.Driver: Failed: Execution Error,return code -101 from shark.execution.SparkTask 13/01/06 17:33:20 INFO ql.Driver: </PERFLOG method=Driver.execute start=1357511600030 end=1357511600054 duration=24> 13/01/06 17:33:20 INFO ql.Driver: <PERFLOG method=releaseLocks> 13/01/06 17:33:20 INFO ql.Driver: </PERFLOG method=releaseLocks start=1357511600054 end=1357511600054 duration=0>
但是,我可以通过在bin / shark-withinfo调用的shell中键入select * from src来查询src表
您可能会问我如何在由“bin / shark-shell”触发的shell中尝试该sql.好吧,我无法进入那个壳.这是我遇到的错误……
https://groups.google.com/forum/?fromgroups=#!topic/shark-users/glZzrUfabGc
[编辑1]:这个NPE似乎是由于SharkENV.sc尚未设置,所以我补充道
shark.SharkEnv.sc = sc
就在执行任何sql2console操作之前.然后它抱怨了scala.tools.nsc的ClassNotFoundException,所以我手动将scala-compiler放在classpath中.之后,代码抱怨了另一个ClassNotFoundException,我无法弄清楚如何解决它,因为我确实在类路径中放了shark jar.
13/01/06 18:09:34 INFO cluster.TaskSetManager: Lost TID 1 (task 1.0:1) 13/01/06 18:09:34 INFO cluster.TaskSetManager: Loss was due to java.lang.classNotFoundException: shark.execution.TableScanoperator$$anonfun$preprocessRdd$3 at java.net.urlclassloader$1.run(urlclassloader.java:217) at java.security.AccessController.doPrivileged(Native Method) at java.net.urlclassloader.findClass(urlclassloader.java:205) at java.lang.classLoader.loadClass(ClassLoader.java:321) at java.lang.classLoader.loadClass(ClassLoader.java:266) at java.lang.class.forName0(Native Method) at java.lang.class.forName(Class.java:264)
[编辑2]:好的,我想出了另一个代码,它可以通过完全遵循shark的源代码来完成交互式repl,从而满足我的需求.
System.setProperty("MASTER","spark://localhost.localdomain:8084") System.setProperty("SPARK_MEM","1g") System.setProperty("SPARK_CLAsspATH","") System.setProperty("HADOOP_HOME","/home/shengc/Downloads/software/hadoop-0.20.205.0") System.setProperty("JAVA_HOME","/usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64") System.setProperty("HIVE_HOME","/home/shengc/Downloads/software/hive-0.9.0-bin") System.setProperty("SCALA_HOME","/home/shengc/Downloads/software/scala-2.9.2") shark.SharkEnv.initWithSharkContext("scratch") val sc = shark.SharkEnv.sc.asInstanceOf[shark.SharkContext] sc.sql2console("select * from src")
这很难看,但至少它有效.欢迎任何关于如何编写更强大的代码的评论!
对于想要以编程方式操作鲨鱼的人,请注意所有的hive和shark jar都必须在你的CLAsspATH中,并且scala编译器也必须在你的类路径中.另一个重要的事情是hadoop的conf也应该在classpath中.
解决方法
我正在使用shark 0.9.0(但我相信你必须在0.6.1中初始化SharkEnv),并且我的SharkEnv按以下方式初始化:
// SharkContext val sc = new SharkContext(master,System.getenv("SPARK_HOME"),executorEnvVar) // Initialize SharkEnv SharkEnv.sc = sc // create and populate table sc.runsql("CREATE TABLE src(key INT,value STRING)") sc.runsql("LOAD DATA LOCAL INPATH '${env:HIVE_HOME}/examples/files/kv1.txt' INTO TABLE src") // print result to stdout println(sc.runsql("select * from src")) println(sc.runsql("select count(*) from src"))
另外,尝试从src表中查询数据(带有“select count(*)…”的注释行)而没有聚合函数,当数据查询正常时我有类似的问题,但是count(*)抛出异常,通过添加MysqL修复在我的例子中,-connector-java.jar到yarn.application.classpath.
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。