如何根据某些条件遍历Pandas DataFrame以创建新的DateFrame

如何解决如何根据某些条件遍历Pandas DataFrame以创建新的DateFrame

我已将具有销售渠道数据的csv文件导入到Pandas DataFrame。每行代表一个机会，其中包含潜在客户名称，产品信息，销售阶段，概率，预期交易规模，预期完成日期，期限等。

现在，我想将其转换为销售预测，据此，我想通过将交易量除以持续时间乘以概率来计算每个期间的平均收入。然后根据预期的截止日期和持续时间为所有可能的时间段创建一行。

我创建了一个简化的示例来支持我的问题：

import pandas as pd

pipeline_data = [{'Client': 'A','Stage': 'SUSPECT','Probability': '0.25','Dealsize': '1200','Duration': 6,'Start_period': '2020-08'},{'Client': 'B','Stage': 'prospect','Probability': '0.60','Dealsize': '1000','Duration': 4,'Start_period': '2020-10'}]

df = pd.DataFrame(pipeline_data)
df

输出：

    Client  Stage    Probability Dealsize   Duration    Start_period
0   A       SUSPECT  0.25        1200       6           2020-08
1   B       prospect 0.60        1000       4           2020-10

因此，客户每月的平均收入为1200/6 * 0.25 =50。收入将在2020-08年至2021-01年（从2020年8月到2021年1月）下降。

首选输出为：

    Client  Stage    Probability Dealsize   Duration    Start_period Weighted_revenue Period
0   A       SUSPECT  0.25        1200       6           2020-08      50               2020-08
1   A       SUSPECT  0.25        1200       6           2020-08      50               2020-09
2   A       SUSPECT  0.25        1200       6           2020-08      50               2020-10 
3   A       SUSPECT  0.25        1200       6           2020-08      50               2020-11
4   A       SUSPECT  0.25        1200       6           2020-08      50               2020-12
5   A       SUSPECT  0.25        1200       6           2020-08      50               2021-01
6   B       prospect 0.60        1000       4           2020-10      150              2020-10
7   B       prospect 0.60        1000       4           2020-10      150              2020-11
8   B       prospect 0.60        1000       4           2020-10      150              2020-12
9   B       prospect 0.60        1000       4           2020-10      150              2021-01

我已经将Start_period转换为Period类型，因此可以将其用于计算/迭代。

我对编码非常陌生。我已经尝试在此站点和其他站点上找到答案，但到目前为止仍未成功。我可以想象使用某种嵌套循环和附加函数来解决此问题，但我不知道如何将其与Pandas一起使用...

任何帮助将不胜感激！

解决方法

您可以尝试使用列表理解pd.date_range和explode

df['Weighted_revenue']=(df['Dealsize'].astype(float)/df['Duration'].astype(float))*df['Probability'].astype(float)
df['Period']=[pd.date_range(x,periods=y,freq="M").strftime('%Y-%m') for x,y in zip(df["Start_period"],df["Duration"])]
df=df.explode('Period')

输出：

df
  Client     Stage Probability Dealsize  Duration Start_period  Weighted_revenue   Period
0      A   suspect        0.25     1200         6      2020-08              50.0  2020-08
0      A   suspect        0.25     1200         6      2020-08              50.0  2020-09
0      A   suspect        0.25     1200         6      2020-08              50.0  2020-10
0      A   suspect        0.25     1200         6      2020-08              50.0  2020-11
0      A   suspect        0.25     1200         6      2020-08              50.0  2020-12
0      A   suspect        0.25     1200         6      2020-08              50.0  2021-01
1      B  prospect        0.60     1000         4      2020-10             150.0  2020-10
1      B  prospect        0.60     1000         4      2020-10             150.0  2020-11
1      B  prospect        0.60     1000         4      2020-10             150.0  2020-12
1      B  prospect        0.60     1000         4      2020-10             150.0  2021-01

详细信息：

首先，我们使用您描述的公式创建'Weighted_revenue'列：

df['Weighted_revenue']=(df['Dealsize'].astype(float)/df['Duration'].astype(float))*df['Probability'].astype(float)
df

  Client     Stage Probability Dealsize  Duration Start_period  Weighted_revenue
0      A   suspect        0.25     1200         6      2020-08              50.0
1      B  prospect        0.60     1000         4      2020-10             150.0

然后，我们将列表理解与zip一起用于基于'Start_period'和'Duration'列创建日期范围

df['Period']=[pd.date_range(x,df["Duration"])]
df

  Client     Stage Probability Dealsize  Duration Start_period  Weighted_revenue                                             Period
0      A   suspect        0.25     1200         6      2020-08              50.0  [2020-08,2020-09,2020-10,2020-11,2020-12,2021-01]
1      B  prospect        0.60     1000         4      2020-10             150.0               [2020-10,2021-01]

最后，我们使用explode展开列表：

df=df.explode('Period')
df 

 Client     Stage Probability Dealsize  Duration Start_period  Weighted_revenue   Period
0      A   suspect        0.25     1200         6      2020-08              50.0  2020-08
0      A   suspect        0.25     1200         6      2020-08              50.0  2020-09
0      A   suspect        0.25     1200         6      2020-08              50.0  2020-10
0      A   suspect        0.25     1200         6      2020-08              50.0  2020-11
0      A   suspect        0.25     1200         6      2020-08              50.0  2020-12
0      A   suspect        0.25     1200         6      2020-08              50.0  2021-01
1      B  prospect        0.60     1000         4      2020-10             150.0  2020-10
1      B  prospect        0.60     1000         4      2020-10             150.0  2020-11
1      B  prospect        0.60     1000         4      2020-10             150.0  2020-12
1      B  prospect        0.60     1000         4      2020-10             150.0  2021-01