如何解决根据日期按组比较值并创建新的数据框,其中值在两列中下降
我有一个 df
+-----+------+--------+--------------------+------+----------+---------+--------+-------+
| ID1 | ID2 | DOC_NO | DATE | COST | COST_NEW | CLIENT | STATUS | LEVEL |
+-----+------+--------+--------------------+------+----------+---------+--------+-------+
| ABC | A123 | 1 | 2021-01-01 0:10:00 | 11 | 21 | ABC123 | A | 1 |
| DEF | B456 | 2 | 2021-01-01 0:10:00 | 12 | 22 | DEF256 | B | 2 |
| GHI | C789 | 3 | 2021-01-01 0:10:00 | 13 | 23 | GHI389 | C | 3 |
| JKL | D890 | 4 | 2021-01-01 0:10:00 | 14 | 24 | JKL490 | D | 4 |
| MNO | E012 | 5 | 2021-01-01 0:10:00 | 15 | 25 | MNO512 | E | 5 |
| ABC | A123 | 6 | 2021-01-01 0:15:00 | 11 | 21 | ABC623 | A | 1 |
| DEF | B456 | 7 | 2021-01-01 0:15:00 | 12 | 22 | DEF756 | B | 3 |
| GHI | C789 | 8 | 2021-01-01 0:15:00 | 13 | 23 | GHI889 | C | 2 |
| JKL | D890 | 9 | 2021-01-02 0:15:00 | 14 | 24 | JKL990 | D | 4 |
| MNO | E012 | 10 | 2021-01-03 0:15:00 | 15 | 25 | MNO1012 | E | 5 |
| ABC | A123 | 11 | 2021-01-03 0:20:00 | 10 | 20 | GHI890 | A | 6 |
| DEF | B456 | 12 | 2021-01-03 0:20:00 | 11 | 21 | JKL991 | B | 2 |
| GHI | C789 | 13 | 2021-01-03 0:20:00 | 12 | 22 | MNO1013 | C | 6 |
| JKL | D890 | 14 | 2021-01-03 0:20:00 | 13 | 23 | GHI891 | D | 7 |
| MNO | E012 | 15 | 2021-01-03 0:20:00 | 14 | 24 | JKL992 | E | 4 |
| ABC | A123 | 16 | 2021-01-03 0:20:00 | 12 | 22 | MNO1014 | A | 2 |
| DEF | B456 | 17 | 2021-01-03 0:20:00 | 13 | 23 | GHI892 | B | 6 |
| GHI | C789 | 18 | 2021-01-03 0:20:00 | 14 | 24 | JKL993 | C | 3 |
| JKL | D890 | 19 | 2021-01-03 0:20:00 | 15 | 25 | MNO1015 | D | 7 |
| MNO | E012 | 20 | 2021-01-03 0:20:00 | 16 | 26 | GHI893 | E | 2 |
| ABC | A123 | 21 | 2021-01-03 0:25:00 | 11 | 21 | ABC124 | A | 1 |
| DEF | B456 | 22 | 2021-01-03 0:25:00 | 12 | 22 | DEF257 | B | 6 |
| GHI | C789 | 23 | 2021-01-03 0:25:00 | 13 | 23 | GHI390 | C | 5 |
| JKL | D890 | 24 | 2021-01-03 0:25:00 | 14 | 24 | JKL491 | D | 7 |
| MNO | E012 | 25 | 2021-01-03 0:25:00 | 15 | 25 | MNO513 | E | 1 |
+-----+------+--------+--------------------+------+----------+---------+--------+-------+
我想将 ID1 和 ID2 分组并按 DOC_NO 和 DATE 排列 df Post that 希望脚本顺序查找 COST 和 COST_NEW 都被删除的实例,并将 COST_REF 和 COST_NEW_REF 设置为前一行中 COST 和 COST_NEW 的最大值(前一行中的 COST 和 COST_NEW 的最大值,但都在同一行中) )。 然后我希望脚本从我们找到 COST_REF 和 COST_NEW_REF 的地方提取 DATE、CLIENT、STATUS 和 LEVEL 的相应值作为 DATE_LAST_REF、CLIENT、STATUS、LEVEL
结果表应如下所示:
+-----+------+--------+--------------------+------+----------+----------+--------------+--------------------+---------+--------+-------+
| ID1 | ID2 | DOC_NO | DATE | COST | COST_REF | COST_NEW | COST_NEW_REF | DATE_LAST_REF | CLIENT | STATUS | LEVEL |
+-----+------+--------+--------------------+------+----------+----------+--------------+--------------------+---------+--------+-------+
| ABC | A123 | 11 | 2021-01-03 0:20:00 | 10 | 11 | 20 | 21 | 2021-01-01 0:15:00 | GHI890 | A | 6 |
| DEF | B456 | 12 | 2021-01-03 0:20:00 | 11 | 12 | 21 | 22 | 2021-01-01 0:15:00 | JKL991 | B | 2 |
| GHI | C789 | 13 | 2021-01-03 0:20:00 | 12 | 13 | 22 | 23 | 2021-01-01 0:15:00 | MNO1013 | C | 6 |
| JKL | D890 | 14 | 2021-01-03 0:20:00 | 13 | 14 | 23 | 24 | 2021-01-02 0:15:00 | GHI891 | D | 7 |
| MNO | E012 | 15 | 2021-01-03 0:20:00 | 14 | 15 | 24 | 25 | 2021-01-03 0:15:00 | JKL992 | E | 4 |
+-----+------+--------+--------------------+------+----------+----------+--------------+--------------------+---------+--------+-------+
这是我一直从事的工作的扩展:
Comparing a value by group based on date and creating new dataframe where value dropped
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。