跟随 https://github.com/databricks/spark-avro
我用:
import com.databricks.spark.avro._ val sqlContext = new org.apache.spark.sql.sqlContext(sc) val df = sqlContext.read.avro("gs://logs.xyz.com/raw/2016/04/20/div1/div2/2016-04-20-08-28-35.UTC.blah-blah.avro")
并得到
java.lang.UnsupportedOperationException: This mix of union types is not supported (see README): ArrayBuffer(STRING)
自述文件清楚地说明:
This library supports reading all Avro types,with the exception of
complex union types. It uses the following mapping from Avro types to
Spark sql types:
val df = sc.textFile("gs://logs.xyz.com/raw/2016/04/20/div1/div2/2016-04-20-08-28-35.UTC.blah-blah.avro") df.take(2).foreach(println)
{“name”:”log_record”,”type”:”record”,”fields”:[{“name”:”request”,”type”:{“type”:”record”,”name”:”request_data”,”fields”:[{“name”:”datetime”,”type”:”string”},{“name”:”ip”,{“name”:”host”,{“name”:”uri”,{“name”:”request_uri”,{“name”:”referer”,{“name”:”useragent”,”type”:”string”}]}}
<——- an excerpt of the full reply ——->
因为我几乎没有控制我得到这些文件的格式,
我的问题在这里 – 有没有一个测试和可以推荐的解决方法?
我用gc dataproc
MASTER=yarn-cluster spark-shell –num-executors 4 –executor-memory 4G –executor-cores 4 –packages com.databricks:spark-avro_2.10:2.0.1,com.databricks:spark-csv_2.11:1.3.0
任何帮助将不胜感激…..
解决方法
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。