我查看了文档,并说它支持以下连接类型:
Type of join to perform. Default inner. Must be one of: inner,cross,
outer,full,full_outer,left,left_outer,right,right_outer,
left_semi,left_anti.
我查看了sql连接上的StackOverflow answer和前几个答案没有提到上面的一些连接,例如left_semi和left_anti.他们在Spark中意味着什么?
解决方法
这是一个简单的说明性实验:
import org.apache.spark._ import org.apache.spark.sql._ import org.apache.spark.sql.expressions._ import org.apache.spark.sql.functions._ object SparkSandBox extends App { case class Row(id: Int,value: String) private[this] implicit val spark = SparkSession.builder().master("local[*]").getorCreate() import spark.implicits._ spark.sparkContext.setLogLevel("ERROR") val r1 = Seq(Row(1,"A1"),Row(2,"A2"),Row(3,"A3"),Row(4,"A4")).toDS() val r2 = Seq(Row(3,"A4"),"A4_1"),Row(5,"A5"),Row(6,"A6")).toDS() val joinTypes = Seq("inner","outer","full","full_outer","left","left_outer","right","right_outer","left_semi","left_anti") joinTypes foreach {joinType => println(s"${joinType.toupperCase()} JOIN") r1.join(right = r2,usingColumns = Seq("id"),joinType = joinType).orderBy("id").show() } }
产量
INNER JOIN +---+-----+-----+ | id|value|value| +---+-----+-----+ | 3| A3| A3| | 4| A4| A4_1| | 4| A4| A4| +---+-----+-----+ OUTER JOIN +---+-----+-----+ | id|value|value| +---+-----+-----+ | 1| A1| null| | 2| A2| null| | 3| A3| A3| | 4| A4| A4| | 4| A4| A4_1| | 5| null| A5| | 6| null| A6| +---+-----+-----+ FULL JOIN +---+-----+-----+ | id|value|value| +---+-----+-----+ | 1| A1| null| | 2| A2| null| | 3| A3| A3| | 4| A4| A4_1| | 4| A4| A4| | 5| null| A5| | 6| null| A6| +---+-----+-----+ FULL_OUTER JOIN +---+-----+-----+ | id|value|value| +---+-----+-----+ | 1| A1| null| | 2| A2| null| | 3| A3| A3| | 4| A4| A4_1| | 4| A4| A4| | 5| null| A5| | 6| null| A6| +---+-----+-----+ LEFT JOIN +---+-----+-----+ | id|value|value| +---+-----+-----+ | 1| A1| null| | 2| A2| null| | 3| A3| A3| | 4| A4| A4_1| | 4| A4| A4| +---+-----+-----+ LEFT_OUTER JOIN +---+-----+-----+ | id|value|value| +---+-----+-----+ | 1| A1| null| | 2| A2| null| | 3| A3| A3| | 4| A4| A4_1| | 4| A4| A4| +---+-----+-----+ RIGHT JOIN +---+-----+-----+ | id|value|value| +---+-----+-----+ | 3| A3| A3| | 4| A4| A4| | 4| A4| A4_1| | 5| null| A5| | 6| null| A6| +---+-----+-----+ RIGHT_OUTER JOIN +---+-----+-----+ | id|value|value| +---+-----+-----+ | 3| A3| A3| | 4| A4| A4_1| | 4| A4| A4| | 5| null| A5| | 6| null| A6| +---+-----+-----+ LEFT_SEMI JOIN +---+-----+ | id|value| +---+-----+ | 3| A3| | 4| A4| +---+-----+ LEFT_ANTI JOIN +---+-----+ | id|value| +---+-----+ | 1| A1| | 2| A2| +---+-----+
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。