Pandas - 在数据框中添加均值、最大值、最小值作为列

如何解决Pandas - 在数据框中添加均值、最大值、最小值作为列

我有一个df =

         statistics  s_values
year
1999  cigarette use       100
1999  cellphone use       310
1999   internet use       101
1999    alcohol use       100
1999       soda use       215
2000  cigarette use       315
2000  cellphone use       317
2000   internet use       325
2000    alcohol use       108
2000       soda use       200
2001  cigarette use       122
2001  cellphone use       311
2001   internet use       112
2001    alcohol use       144
2001       soda use       689

我根据 year index 和 statistics column 计算了最大值、最小值和平均值。

我想在输出结果如下所示的数据框中插入平均值、最大值和最小值作为列

我想要的输出：

         statistics   s_values        mean  min     max
year                    
1999    alcohol use     100.0        104.0  100.0   108.0
1999    cellphone use   310.0        313.5  310.0   317.0
1999    cigarette use   100.0        207.5  100.0   315.0
1999    internet use    101.0        213.0  101.0   325.0
1999    soda use        215.0        207.5  200.0   215.0
2000    alcohol use     108.0        104.0  100.0   108.0
2000    cellphone use   317.0        313.5  310.0   317.0
2000    cigarette use   315.0        207.5  100.0   315.0
2000    internet use    325.0        213.0  101.0   325.0
2000    soda use        200.0        207.5  200.0   215.0
2001    alcohol use     144.0        104.0  100.0   108.0
2001    cellphone use   311.0        313.5  310.0   317.0
2001    cigarette use   122.0        207.5  100.0   315.0
2001    internet use    112.0        213.0  101.0   325.0
2001    soda use        689.0        207.5  200.0   215.0

我尝试执行以下操作，但列中的值都是 NaN

gen_mean = df.groupby('statistics').mean()
gen_min = df.groupby('statistics').min()
gen_max = df.groupby('statistics').max()

df.insert(2,'Gen Avg',gen_mean)
df.insert(3,'Gen Max',gen_max)
df.insert(4,'Gen Min',gen_min)

谢谢

解决方法

groupby(...).mean() 将返回一个包含与组对应的行的数据框。你需要transform：

df['mean'] = df.groupby('statistics')['s_values'].transform('mean')
# I hope you get the idea how to get min/max

尝试使用 groupby aggregate + join：

df = df.join(
    df.groupby('statistics')['s_values'].aggregate(['mean','min','max']),on='statistics'
)

df：

         statistics  s_values        mean  min  max
year                                               
1999  cigarette use       100  179.000000  100  315
1999  cellphone use       310  312.666667  310  317
1999   internet use       101  179.333333  101  325
1999    alcohol use       100  117.333333  100  144
1999       soda use       215  368.000000  200  689
2000  cigarette use       315  179.000000  100  315
2000  cellphone use       317  312.666667  310  317
2000   internet use       325  179.333333  101  325
2000    alcohol use       108  117.333333  100  144
2000       soda use       200  368.000000  200  689
2001  cigarette use       122  179.000000  100  315
2001  cellphone use       311  312.666667  310  317
2001   internet use       112  179.333333  101  325
2001    alcohol use       144  117.333333  100  144
2001       soda use       689  368.000000  200  689

使用的帧：

df = pd.DataFrame({
    'year': [1999,1999,2000,2001,2001],'statistics': ['cigarette use','cellphone use','internet use','alcohol use','soda use','cigarette use','soda use'],'s_values': [100,310,101,100,215,315,317,325,108,200,122,311,112,144,689]
}).set_index('year')

Pandas - 在数据框中添加均值、最大值、最小值作为列

如何解决Pandas - 在数据框中添加均值、最大值、最小值作为列

解决方法

相关推荐