如何解决需要使用spark数据框从查找文件获取具有计算值的字段
我有一个如下的产品文件,并且在查找文件中有成本价和折扣的公式。我们需要从查找文件中获取折扣和成本价格字段。
产品编号,产品名称,标价,销售价格,利润
101,AAAA,5500,5400,500
102,ABCS,7000,6500,1000
103,GHMA,6500,5600,700
104,PSNLA,8450,8000,800
105,GNBC,1250,1200,600
查找文件:
键,值
成本价格,(销售价格+利润)
折扣,(marked_price-selling_price)
最终输出:
产品编号,产品名称,标价,销售价格,利润,成本价格,折扣
101,AAAA,5500,500,5900,100
102,ABCS,7000,1000,7500,500
103,GHMA,6500,700,6300,900
104,PSNLA,8450,800,8800,450
105,GNBC,1250,600,1800,50
解决方法
首先,应从查找文件中制作一个Map
,然后可以使用expr
添加列:
val lookup = Map(
"cost_price" -> "selling_price+profit","discount" -> "(marked_price-selling_price)"
)
val df = Seq(
(101,"AAAA",5500,5400,500),(102,"ABCS",7000,6500,1000)
)
.toDF("product_id","product_name","marked_price","selling_price","profit")
df
.withColumn("cost_price",expr(lookup("cost_price")))
.withColumn("discount",expr(lookup("discount")))
.show()
给予:
+----------+------------+------------+-------------+------+----------+--------+
|product_id|product_name|marked_price|selling_price|profit|cost_price|discount|
+----------+------------+------------+-------------+------+----------+--------+
| 101| AAAA| 5500| 5400| 500| 5900| 100|
| 102| ABCS| 7000| 6500| 1000| 7500| 500|
+----------+------------+------------+-------------+------+----------+--------+
您还可以迭代查找:
val finalDF = lookup.foldLeft(df){case (df,(k,v)) => df.withColumn(k,expr(v))}
,
import spark.implicits._
val dataheader = Seq("product_id","profit")
val data = Seq(
(101,1000),(103,"GHMA",5600,700),(104,"PSNLA",8450,8000,800),(105,"GNBC",1250,1200,600))
val dataDF = data.toDF(dataheader: _*)
val lookupMap = Seq(
("cost_price","(selling_price+profit)"),("discount","(marked_price-selling_price)")) //You can read your data from look up file and construct a dataframe
.toDF("key","value").select("key","value").as[(String,String)].collect.toMap
lookupMap.foldLeft(dataDF)((df,look) => {
df.withColumn(look._1,expr(look._2))
}).show()
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。