微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

scala – Spark中的各种连接类型是什么?

我查看了文档,并说它支持以下连接类型:

Type of join to perform. Default inner. Must be one of: inner,cross,
outer,full,full_outer,left,left_outer,right,right_outer,
left_semi,left_anti.

我查看了sql连接上的StackOverflow answer和前几个答案没有提到上面的一些连接,例如left_semi和left_anti.他们在Spark中意味着什么?

解决方法

这是一个简单的说明性实验:

import org.apache.spark._
import org.apache.spark.sql._
import org.apache.spark.sql.expressions._
import org.apache.spark.sql.functions._

object SparkSandBox extends App {

  case class Row(id: Int,value: String)

  private[this] implicit val spark = SparkSession.builder().master("local[*]").getorCreate()
  import spark.implicits._
  spark.sparkContext.setLogLevel("ERROR")

  val r1 = Seq(Row(1,"A1"),Row(2,"A2"),Row(3,"A3"),Row(4,"A4")).toDS()
  val r2 = Seq(Row(3,"A4"),"A4_1"),Row(5,"A5"),Row(6,"A6")).toDS()

  val joinTypes = Seq("inner","outer","full","full_outer","left","left_outer","right","right_outer","left_semi","left_anti")

  joinTypes foreach {joinType =>
    println(s"${joinType.toupperCase()} JOIN")
    r1.join(right = r2,usingColumns = Seq("id"),joinType = joinType).orderBy("id").show()
  }
}

产量

INNER JOIN
+---+-----+-----+
| id|value|value|
+---+-----+-----+
|  3|   A3|   A3|
|  4|   A4| A4_1|
|  4|   A4|   A4|
+---+-----+-----+

OUTER JOIN
+---+-----+-----+
| id|value|value|
+---+-----+-----+
|  1|   A1| null|
|  2|   A2| null|
|  3|   A3|   A3|
|  4|   A4|   A4|
|  4|   A4| A4_1|
|  5| null|   A5|
|  6| null|   A6|
+---+-----+-----+

FULL JOIN
+---+-----+-----+
| id|value|value|
+---+-----+-----+
|  1|   A1| null|
|  2|   A2| null|
|  3|   A3|   A3|
|  4|   A4| A4_1|
|  4|   A4|   A4|
|  5| null|   A5|
|  6| null|   A6|
+---+-----+-----+

FULL_OUTER JOIN
+---+-----+-----+
| id|value|value|
+---+-----+-----+
|  1|   A1| null|
|  2|   A2| null|
|  3|   A3|   A3|
|  4|   A4| A4_1|
|  4|   A4|   A4|
|  5| null|   A5|
|  6| null|   A6|
+---+-----+-----+

LEFT JOIN
+---+-----+-----+
| id|value|value|
+---+-----+-----+
|  1|   A1| null|
|  2|   A2| null|
|  3|   A3|   A3|
|  4|   A4| A4_1|
|  4|   A4|   A4|
+---+-----+-----+

LEFT_OUTER JOIN
+---+-----+-----+
| id|value|value|
+---+-----+-----+
|  1|   A1| null|
|  2|   A2| null|
|  3|   A3|   A3|
|  4|   A4| A4_1|
|  4|   A4|   A4|
+---+-----+-----+

RIGHT JOIN
+---+-----+-----+
| id|value|value|
+---+-----+-----+
|  3|   A3|   A3|
|  4|   A4|   A4|
|  4|   A4| A4_1|
|  5| null|   A5|
|  6| null|   A6|
+---+-----+-----+

RIGHT_OUTER JOIN
+---+-----+-----+
| id|value|value|
+---+-----+-----+
|  3|   A3|   A3|
|  4|   A4| A4_1|
|  4|   A4|   A4|
|  5| null|   A5|
|  6| null|   A6|
+---+-----+-----+

LEFT_SEMI JOIN
+---+-----+
| id|value|
+---+-----+
|  3|   A3|
|  4|   A4|
+---+-----+

LEFT_ANTI JOIN
+---+-----+
| id|value|
+---+-----+
|  1|   A1|
|  2|   A2|
+---+-----+

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。

相关推荐