微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

Scala DataFrame-如何仅打印具有最大值的行

如何解决Scala DataFrame-如何仅打印具有最大值的行

我有以下数据框df

+----------+--------+---------+
|        ID| text   |    count|
+----------+--------+---------+
|         3|    word|      316|
|         3|    work|      385|
|         3|    want|      205|
|         3|     cat|      251|
|         1|  office|      343|
|         1|     sky|      643|
|         1|   going|      126|
|         2|    home|      124|
|         2|  school|       23|
|         2|   sleep|      103|
//and so on

现在,对于每个 ID,我只想显示具有最大2 counts的行,并拖放/隐藏其余的行:

+----------+--------+---------+
|        ID| text   |    count|
+----------+--------+---------+
|         3|    word|      316|
|         3|    work|      385|
|         1|  office|      343|
|         1|     sky|      643|
|         2|    home|      124|
|         2|   sleep|      103|
//and so on

我们如何最有效地实现这一目标??

解决方法

在spark中使用窗口功能,在partitionByID上使用orderBy。{p}

示例:

count
,

检查以下代码。

val df=Seq((3,"word",316),(3,"work",385),"want",205),"cat",251),(1,"office",343),"sky",643),"going",126),(2,"home",124),"school",23),"sleep",103)).toDF("ID","text","count")

import org.apache.spark.sql.functions._
import org.apache.spark.sql.expressions._
val w=Window.partitionBy(col("ID")).orderBy(desc("count"))
df.withColumn("rn",row_number().over(w)).filter(col("rn") <=2).drop("rn").show()
//+---+------+-----+
//| ID|  text|count|
//+---+------+-----+
//|  1|   sky|  643|
//|  1|office|  343|
//|  3|  work|  385|
//|  3|  word|  316|
//|  2|  home|  124|
//|  2| sleep|  103|
//+---+------+-----+
scala> df.show(false)
+---+------+-----+
|ID |text  |count|
+---+------+-----+
|3  |word  |316  |
|3  |work  |385  |
|3  |want  |205  |
|3  |cat   |251  |
|1  |office|343  |
|1  |sky   |643  |
|1  |going |126  |
|2  |home  |124  |
|2  |school|23   |
|2  |sleep |103  |
+---+------+-----+
scala> import org.apache.spark.sql.expressions._

scala> val windowSpec = Window
.partitionBy($"id")
.orderBy($"ID".asc,$"count".desc)

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。