一、分组Groupby
使用方式:df.groupby([分组的依据])[分组的数据]
例如,对不同学校和性别的学生身高分组:
df.groupby(['School', 'Gender'])['Height']
练一练:请根据上下四分位数分割,将体重分为high、normal、low三组,统计身高的均值。
low = df['Weight'].quantile(0.25)
high = df['Weight'].quantile(0.25)
condition1 = df['Weight']>high
condition2 = df['Weight']<low
condition3 = low< df['Weight']<high #这一块有问题,还没来得及问
df_high = df.groupby(condition1)['Height'].mean()
df_mid = df.groupby(condition3)['Height'].mean()
df_low = df.groupby(condition2)['Height'].mean()
通过 ngroups
属性,可以得到分组个数:
a = df.groupby(['School', 'Gender'])
a.ngroups
Out[33]: 8
进一步,通过 groups
属性,可以返回从 组名 映射到 组索引列表 的字典
a.groups.keys()
Out[37]: dict_keys([('Fudan University', 'Female'), ('Fudan University', 'Male'), ('Peking University', 'Female'), ('Peking University', 'Male'), ('Shanghai Jiao Tong University', 'Female'), ('Shanghai Jiao Tong University', 'Male'), ('Tsinghua University', 'Female'), ('Tsinghua University', 'Male')])
也可以直接通过 drop_duplicates
就能知道具体的组类别,其结果和上面的一致:
In [11]: df[['School', 'Gender']].drop_duplicates()
Out[11]:
School Gender
0 Shanghai Jiao Tong University Female
1 Peking University Male
2 Shanghai Jiao Tong University Male
3 Fudan University Female
4 Fudan University Male
5 Tsinghua University Female
9 Peking University Female
16 Tsinghua University Male
练一练:上一小节介绍了可以通过 drop_duplicates
得到具体的组类别,现请用 groups
属性完成类似的功能。
a = df.groupby(['School', 'Gender'])
list(a.groups.keys())
Out[43]:
[('Fudan University', 'Female'),
('Fudan University', 'Male'),
('Peking University', 'Female'),
('Peking University', 'Male'),
('Shanghai Jiao Tong University', 'Female'),
('Shanghai Jiao Tong University', 'Male'),
('Tsinghua University', 'Female'),
('Tsinghua University', 'Male')]
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。