如何解决Spark SQL表达式中的IF和ELSE语句
我希望运行一个sql表达式,以检查下一个事件是否为“已交付”或“取消订单”,并根据不同的结果返回不同的结果。
df = spark.createDataFrame([["ORDER","2009-11-23","1"],["DELIVERED","2009-12-17",["ORDER-CANCELED","2009-11-25",["ORDER","2009-12-03","1"]]).toDF("EVENT","DATE","ID")
+--------------+----------+---+
| EVENT| DATE| ID|
+--------------+----------+---+
| ORDER|2009-11-23| 1|
|ORDER-CANCELED|2009-11-25| 1|
| ORDER|2009-12-03| 1|
| DELIVERED|2009-12-17| 1|
+--------------+----------+---+
我使用以下代码编写了一个仅适用于DELIVERED事件的语句:
df = df.withColumn("NEXT",f.expr("""
case when EVENT = 'ORDER' then
first(if(EVENT in ('DELIVERED'),'SUCCESS',null),True)
over (Partition By ID ORDER BY ID,DATE ROWS BETWEEN 1 FOLLOWING AND UNBOUNDED FOLLOWING)
else null end
"""))
这有效,但是我不知道如何为else语句“ ORDER-CANCELED”添加第二个条件。
df = df.withColumn("NEXT",null)
**elseif(EVENT in ('ORDER-CANCELED'),'CANCELED'),True)**
over (Partition By ID ORDER BY ID,DATE ROWS BETWEEN 1 FOLLOWING AND UNBOUNDED FOLLOWING)
else null end
"""))
解决方法
大概是这样吗?
df = df.withColumn(
"NEXT",f.expr("""
case when EVENT = 'ORDER' then
first(
case when EVENT in ('DELIVERED') then
'SUCCESS'
when EVENT in ('ORDER-CANCELED') then
'CANCELED'
else
NULL
end
) over (Partition By ID ORDER BY ID,DATE ROWS BETWEEN 1 FOLLOWING AND UNBOUNDED FOLLOWING)
else NULL
end
"""))
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。