找出每月不等值的平均值并根据某些条件进行分配

如何解决找出每月不等值的平均值并根据某些条件进行分配

我目前正在努力将我的数据转换为有用的数据集。我需要从第一个月到最后一个月平均分配付款。问题是付款不一致和不平等。此外,有些付款已全额支付,应从第一笔付款加上根据协议数据框适用的期限进行分配。

我的表格如下:

第一个表:付款

cust_id agreement_id 日期 付款
1 A 12/1/20 200
1 A 2/2/21 200
1 A 2/3/21 100
1 A 5/1/21 200
1 B 1/2/21 50
1 B 1/9/21 20
1 B 3/1/21 80
1 B 4/23/21 90
2 C 1/21/21 600
3 D 3/4/21 150
3 D 5/3/21 150

这是支付数据框的代码:

payments = pd.DataFrame.from_dict({'cust_id': {0: 1,1: 1,2: 1,3: 1,4: 1,5: 1,6: 1,7: 1,8: 2,9: 3,10: 3},'agreement_id': {0: 'A',1: 'A',2: 'A',3: 'A',4: 'B',5: 'B',6: 'B',7: 'B',8: 'C',9: 'D',10: 'D'},'date': {0: '12/1/20',1: '2/2/21',2: '2/3/21',3: '5/1/21',4: '1/2/21',5: '1/9/21',6: '3/1/21',7: '4/23/21',8: '1/21/21',9: '3/4/21',10: '5/3/21'},'payment': {0: 200,1: 200,2: 100,3: 200,4: 50,5: 20,6: 80,7: 90,8: 600,9: 150,10: 150}})

第二表:协议

agreement_id 激活 term_months total_fee
A 12/1/20 24 4800
B 1/21/21 6 600
C 1/21/21 6 600
D 3/4/21 6 300

这是协议数据框的代码:

agreement = pd.DataFrame.from_dict({'agreement_id': {0: 'A',1: 'B',2: 'C',3: 'D'},'activation': {0: '12/1/20',1: '1/2/21',2: '1/21/21',3: '3/4/21'},'term_months': {0: 24,1: 6,2: 6,3: 6},'total_fee': {0: 4800,1: 300,2: 600,3: 300}})

我想要的结果如下:

cust_id agreement_id 日期 付款
1 A 12/1/20 116.67
1 A 1/1/21 116.67
1 A 2/1/21 116.67
1 A 3/1/21 116.67
1 A 4/1/21 116.67
1 A 5/1/21 116.67
1 B 1/1/21 60
1 B 2/1/21 60
1 B 3/1/21 60
1 B 4/1/21 60
2 C 1/1/21 100
2 C 2/1/21 100
2 C 3/1/21 100
2 C 4/1/21 100
2 C 5/1/21 100
2 C 6/1/21 100
3 D 3/1/21 50
3 D 4/1/21 50
3 D 5/1/21 50
3 D 6/1/21 50
3 D 7/1/21 50
3 D 8/1/21 50

或者,以代码形式:

    cust_id agreement_id     date  payment
0         1            A  12/1/20   116.67
1         1            A   1/1/21   116.67
2         1            A   2/1/21   116.67
3         1            A   3/1/21   116.67
4         1            A   4/1/21   116.67
5         1            A   5/1/21   116.67
6         1            B   1/1/21    60.00
7         1            B   2/1/21    60.00
8         1            B   3/1/21    60.00
9         1            B   4/1/21    60.00
10        2            C   1/1/21   100.00
11        2            C   2/1/21   100.00
12        2            C   3/1/21   100.00
13        2            C   4/1/21   100.00
14        2            C   5/1/21   100.00
15        2            C   6/1/21   100.00
16        3            D   3/1/21    50.00
17        3            D   4/1/21    50.00
18        3            D   5/1/21    50.00
19        3            D   6/1/21    50.00
20        3            D   7/1/21    50.00
21        3            D   8/1/21    50.00

激活与第一次付款的日期相同。

我尝试使用以下代码(由 AlexK 建议)创建另一列,但仅当总付款少于总费用时才适用。但是,当总付款等于总费用时,我需要从付款开始到月底(开始加上月数)相应地分配付款。

payments['date'] = pd.to_datetime(payments['date'])
resampled_payments = (payments
   .set_index('date')
   .groupby(['cust_id','agreement_id'])
   .resample('MS')
   .agg({'payment': sum})
   .reset_index()
)

resampled_payments['avg_monthly_payment'] = (resampled_payments
   .groupby(['cust_id','agreement_id'])['payment']
   .transform('mean')
)

解决方法

这是 R 解决方案(因为您也用 R 标记了它)

#load libraries
library(tidyverse)
library(lubridate)

pymts <- read.table(text = "cust_id agreement_id    date    payment
1   A   12/1/20 200
1   A   2/2/21  200
1   A   2/3/21  100
1   A   5/1/21  200
1   B   1/2/21  50
1   B   1/9/21  20
1   B   3/1/21  80
1   B   4/23/21 90
2   C   1/21/21 600
3   D   3/4/21  150
3   D   5/3/21  150",header = T)

agmt <- read.table(text = "agreement_id activation  term_months total_fee
A   12/1/20 24  4800
B   1/21/21 6   600
C   1/21/21 6   600
D   3/4/21  6   300",header = T)

#final code

final<- pymts %>% mutate(date = as.Date(date,"%m/%d/%y")) %>%
  left_join(agmt %>% mutate(activation = as.Date(activation,"%m/%d/%y")),by = "agreement_id") %>%
  group_by(cust_id,agreement_id) %>%
  mutate(d = n(),date = floor_date(date,"month")) %>%
  complete(date = seq.Date(from = min(date),by = "month",length.out = ifelse(sum(payment) == first(total_fee),first(term_months),(year(max(date)) -
                                                                                                      year(min(date)))*12 +
                                                                                                      month(max(date)) - 
                                                                                                      month(min(date)) +1))) %>%
  mutate(payment = sum(payment,na.rm = T)) %>%
  filter(!duplicated(date)) %>%
  mutate(payment = payment/n()) %>%
  select(1:4) %>% ungroup()


final
# A tibble: 22 x 4
   cust_id agreement_id date       payment
     <int> <chr>        <date>       <dbl>
 1       1 A            2020-12-01    117.
 2       1 A            2021-01-01    117.
 3       1 A            2021-02-01    117.
 4       1 A            2021-03-01    117.
 5       1 A            2021-04-01    117.
 6       1 A            2021-05-01    117.
 7       1 B            2021-01-01     60 
 8       1 B            2021-02-01     60 
 9       1 B            2021-03-01     60 
10       1 B            2021-04-01     60 
# ... with 12 more rows
,

鉴于您的数据框,这应该可以工作

from dateutil.relativedelta import relativedelta

# Transofrm column to date
payments['date']= pd.to_datetime(payments['date'])
agreement['activation']= pd.to_datetime(agreement['activation'])

final =pd.merge(payments,agreement,on='agreement_id',how='left')

# set date to beginning of month
final['date'] = pd.to_datetime(final.date).dt.to_period('M').dt.to_timestamp()


def set_date_range(df):
    if df['payment'].sum() == df['total_fee'].iloc[0]:
        return pd.date_range(min(g['date']),periods=df['term_months'].iloc[0],freq='M')
    else:
        return pd.date_range(min(g['date']),max(g['date'])+relativedelta(months=+1),freq='M' )

# Create dataframe with dates
seq_df = pd.DataFrame()
for i,g in final.groupby(['cust_id','agreement_id']):
    seq_df = pd.concat([seq_df,pd.DataFrame({'cust_id': i[0],'agreement_id': i[1],'date': set_date_range(g)})])

# Set date to beginnig of month
seq_df['date'] = pd.to_datetime(seq_df.date).dt.to_period('M').dt.to_timestamp()

final = (pd.concat([final,seq_df],sort=True)
              .sort_values(['cust_id','agreement_id','date'])
              .reset_index(drop=True)
              .reindex(columns=final.columns))

final['payment'] = final.groupby(by=['cust_id','agreement_id'])["payment"].transform("sum")

final = final.drop_duplicates(['cust_id','date'])

final['n'] = final.groupby(by=['cust_id','agreement_id'])["cust_id"].transform("count")
final['payment_due'] = final['payment']/final['n']
final[['cust_id','date','payment_due']]

我无法完全复制管道形式 tidyverse,但输出应该匹配。最难的部分是 seq_df 的创建,但应该没问题(针对更一般的用例进行双重测试)

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。

相关推荐


使用本地python环境可以成功执行 import pandas as pd import matplotlib.pyplot as plt # 设置字体 plt.rcParams[&#39;font.sans-serif&#39;] = [&#39;SimHei&#39;] # 能正确显示负号 p
错误1:Request method ‘DELETE‘ not supported 错误还原:controller层有一个接口,访问该接口时报错:Request method ‘DELETE‘ not supported 错误原因:没有接收到前端传入的参数,修改为如下 参考 错误2:cannot r
错误1:启动docker镜像时报错:Error response from daemon: driver failed programming external connectivity on endpoint quirky_allen 解决方法:重启docker -&gt; systemctl r
错误1:private field ‘xxx‘ is never assigned 按Altʾnter快捷键,选择第2项 参考:https://blog.csdn.net/shi_hong_fei_hei/article/details/88814070 错误2:启动时报错,不能找到主启动类 #
报错如下,通过源不能下载,最后警告pip需升级版本 Requirement already satisfied: pip in c:\users\ychen\appdata\local\programs\python\python310\lib\site-packages (22.0.4) Coll
错误1:maven打包报错 错误还原:使用maven打包项目时报错如下 [ERROR] Failed to execute goal org.apache.maven.plugins:maven-resources-plugin:3.2.0:resources (default-resources)
错误1:服务调用时报错 服务消费者模块assess通过openFeign调用服务提供者模块hires 如下为服务提供者模块hires的控制层接口 @RestController @RequestMapping(&quot;/hires&quot;) public class FeignControl
错误1:运行项目后报如下错误 解决方案 报错2:Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.8.1:compile (default-compile) on project sb 解决方案:在pom.
参考 错误原因 过滤器或拦截器在生效时,redisTemplate还没有注入 解决方案:在注入容器时就生效 @Component //项目运行时就注入Spring容器 public class RedisBean { @Resource private RedisTemplate&lt;String
使用vite构建项目报错 C:\Users\ychen\work&gt;npm init @vitejs/app @vitejs/create-app is deprecated, use npm init vite instead C:\Users\ychen\AppData\Local\npm-
参考1 参考2 解决方案 # 点击安装源 协议选择 http:// 路径填写 mirrors.aliyun.com/centos/8.3.2011/BaseOS/x86_64/os URL类型 软件库URL 其他路径 # 版本 7 mirrors.aliyun.com/centos/7/os/x86
报错1 [root@slave1 data_mocker]# kafka-console-consumer.sh --bootstrap-server slave1:9092 --topic topic_db [2023-12-19 18:31:12,770] WARN [Consumer clie
错误1 # 重写数据 hive (edu)&gt; insert overwrite table dwd_trade_cart_add_inc &gt; select data.id, &gt; data.user_id, &gt; data.course_id, &gt; date_format(
错误1 hive (edu)&gt; insert into huanhuan values(1,&#39;haoge&#39;); Query ID = root_20240110071417_fe1517ad-3607-41f4-bdcf-d00b98ac443e Total jobs = 1
报错1:执行到如下就不执行了,没有显示Successfully registered new MBean. [root@slave1 bin]# /usr/local/software/flume-1.9.0/bin/flume-ng agent -n a1 -c /usr/local/softwa
虚拟及没有启动任何服务器查看jps会显示jps,如果没有显示任何东西 [root@slave2 ~]# jps 9647 Jps 解决方案 # 进入/tmp查看 [root@slave1 dfs]# cd /tmp [root@slave1 tmp]# ll 总用量 48 drwxr-xr-x. 2
报错1 hive&gt; show databases; OK Failed with exception java.io.IOException:java.lang.RuntimeException: Error in configuring object Time taken: 0.474 se
报错1 [root@localhost ~]# vim -bash: vim: 未找到命令 安装vim yum -y install vim* # 查看是否安装成功 [root@hadoop01 hadoop]# rpm -qa |grep vim vim-X11-7.4.629-8.el7_9.x
修改hadoop配置 vi /usr/local/software/hadoop-2.9.2/etc/hadoop/yarn-site.xml # 添加如下 &lt;configuration&gt; &lt;property&gt; &lt;name&gt;yarn.nodemanager.res