如何解决自上次时间戳以来,如何使用SparkSQL从Postgres表中检索所有记录
我需要根据时间戳检索新获取的记录。我使用的“ max”仅给出1条记录,desc和limit也是这种情况
当数据加载到表中时,我需要动态获取记录
Task Task-B Timestamp
aaa bbb 2020-09-02 16:45:12
aa2 bb2 2020-09-02 17:16:10
aa3 bb3 2020-09-03 10:09:15
aa4 bb4 2002-09-01 09:14:34
任务aaa到aa3是新的,我只需要检索
Task Task-B Timestamp
aaa bbb 2020-09-02 16:45:12
aa2 bb2 2020-09-02 17:16:10
aa3 bb3 2020-09-03 10:09:15
解决方法
任务:显示最近24小时的数据。
import org.apache.spark.sql.functions._
import spark.implicits._
case class Task(task: String,taskR: String,timestamp: String)
val sourceDF = Seq(Task("aaa","bbb","2020-09-02 16:45:12"),Task("aa2","bb2","2020-09-02 17:16:10"),Task("aa3","bb3","2020-09-03 10:09:15"),Task("aa4","bb4","2002-09-01 09:14:34")
).toDF()
sourceDF.show(false)
// +----+-----+-------------------+
// |task|taskR|timestamp |
// +----+-----+-------------------+
// |aaa |bbb |2020-09-02 16:45:12|
// |aa2 |bb2 |2020-09-02 17:16:10|
// |aa3 |bb3 |2020-09-03 10:09:15|
// |aa4 |bb4 |2002-09-01 09:14:34|
// +----+-----+-------------------+
val resDF = sourceDF
.where(((unix_timestamp(current_timestamp()) - unix_timestamp('timestamp)) / 3600) <= 24)
resDF.show(false)
resDF.printSchema()
// +----+-----+-------------------+
// |task|taskR|timestamp |
// +----+-----+-------------------+
// |aaa |bbb |2020-09-02 16:45:12|
// |aa2 |bb2 |2020-09-02 17:16:10|
// |aa3 |bb3 |2020-09-03 10:09:15|
// +----+-----+-------------------+
//
// root
// |-- task: string (nullable = true)
// |-- taskR: string (nullable = true)
// |-- timestamp: string (nullable = true)
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。