Pandas 数据帧中具有依赖性的条件累积总和

如何解决Pandas 数据帧中具有依赖性的条件累积总和

我正在尝试计算给定一系列金融交易的两个累计金额。有 4 种交易类型，每种类型都有交易金额： D - 存款 W - 撤回 G - 增益 L - 损失

数据框是这样创建的

import pandas as pd
import numpy as np

data = { 'Type': ['D','D','W','G','L','L' ],'Amount': [10,10,-5,5,-10,-25,25,-30]
       }
df = pd.DataFrame(data,columns = ['Type','Amount'])

使用 cumsum() 很容易计算流动资本，它基本上包括所有交易。

df['Capital'] = df['Amount'].cumsum()

我要计算的另一个实体是 Principal，它代表输入到帐户中的金额。这仅考虑 D 和 W 类型的事务。我可以在这里做一个简单的过滤器：

df['Principal'] = df.apply(lambda row : row['Amount'] if (row['Type'] == 'W' or row['Type'] == 'D') else 0,axis=1).cumsum()

然而，这有一个问题。当有收益和有提款时，提款需要在影响本金之前从收益中提取。上面的输出在下面的结果中有错误（第 8 行和第 10 行）：

    Type    Amount  Capital Principal
0   D       10      10      10
1   D       10      20      20
2   W       -5      15      15
3   D       10      25      25
4   G       5       30      25
5   G       5       35      25
6   G       5       40      25
7   L       -5      35      25
8   W       -10     25      15   <- should stays at 25
9   G       10      35      15   <- Now wrong because of above
10  W       -25     10      -10  <- error escalades
11  G       25      35      -10
12  L       -30     5       -10

我可以通过执行以下操作来获得所需的结果，但似乎有点难看。想知道是否有一些更简单或快捷的方法。我想这是金融领域的常见计算。

df['Principal'] = np.nan
currentPrincipal = 0
for index,row in df.iterrows():
    if (row['Type'] == 'D'):
        #row['Principal'] = currentPrincipal + row['Amount']
        df.loc[index,'Principal'] = currentPrincipal + row['Amount']
    elif (row['Type'] == 'W' and row['Capital'] <= currentPrincipal):
        #row['Principal'] = row['Capital']
        df.loc[index,'Principal'] = row['Capital']
    else:
        df.loc[index,'Principal'] = currentPrincipal
        
    currentPrincipal = df.loc[index,'Principal']

我尝试使用 apply 没有成功，因为我们依赖 Principal 的先前结果，需要继续执行。正确结果：

    Type    Amount  Capital Principal
0   D       10      10      10
1   D       10      20      20
2   W       -5      15      15
3   D       10      25      25
4   G       5       30      25
5   G       5       35      25
6   G       5       40      25
7   L       -5      35      25
8   W       -10     25      25
9   G       10      35      25
10  W       -25     10      10
11  G       25      35      10
12  L       -30     5       10

解决方法

你可以这样做：

# calculate cumulative withdrawals
w = df['Amount'].where(df['Type'].eq('W')).cumsum()

# calculate cumulative deposits
d = df['Amount'].where(df['Type'].eq('D'),0).cumsum()

# calculate cumulative gain & loss
g = df['Amount'].where(df['Type'].isin(['G','L']),0).cumsum()

# calculate principal = deposit + net_withdrawal(if any)
df['Principal'] =  d + (g + w).where(lambda x: x < 0).ffill().fillna(0)

结果：

   Type  Amount  Capital  Principal
0     D      10       10       10.0
1     D      10       20       20.0
2     W      -5       15       15.0
3     D      10       25       25.0
4     G       5       30       25.0
5     G       5       35       25.0
6     G       5       40       25.0
7     L      -5       35       25.0
8     W     -10       25       25.0
9     G      10       35       25.0
10    W     -25       10       10.0
11    G      25       35       10.0
12    L     -30        5       10.0