我正在寻找一种方法来计算一列中的值的数量,并证明它比我原先想象的更棘手.
percentile percentile1 percentile2 percentile3
0 mediocre contender contender mediocre
69 mediocre bad mediocre mediocre
117 mediocre mediocre mediocre mediocre
144 mediocre none mediocre contender
171 mediocre mediocre contender mediocre
我正在尝试创建类似于以下输出的内容.它需要四个选项并按列计算.它本质上是每列的pd.value.counts.任何帮助肯定会受到赞赏.
percentile percentile1 percentile2 percentile3
mediocre: 5 2 3 4
contender: 0 1 2 1
bad: 0 1 0 0
none: 0 1 0 0
解决方法:
它有助于使您的数据首先“整洁”(PDF).这意味着列应代表变量,行应代表观察.
In [98]: df
Out[98]:
percentile percentile1 percentile2 percentile3
0 mediocre contender contender mediocre
69 mediocre bad mediocre mediocre
117 mediocre mediocre mediocre mediocre
144 mediocre none mediocre contender
171 mediocre mediocre contender mediocre
[5 rows x 4 columns]
In [125]: melted = pd.melt(df); melted
Out[125]:
variable value
0 percentile mediocre
1 percentile mediocre
2 percentile mediocre
3 percentile mediocre
4 percentile mediocre
5 percentile1 contender
6 percentile1 bad
7 percentile1 mediocre
8 percentile1 none
9 percentile1 mediocre
10 percentile2 contender
11 percentile2 mediocre
12 percentile2 mediocre
13 percentile2 mediocre
14 percentile2 contender
15 percentile3 mediocre
16 percentile3 mediocre
17 percentile3 mediocre
18 percentile3 contender
19 percentile3 mediocre
[20 rows x 2 columns]
然后使用crosstab制作频率表:
In [127]: pd.crosstab(index=[melted['value']], columns=[melted['variable']])
Out[127]:
variable percentile percentile1 percentile2 percentile3
value
bad 0 1 0 0
contender 0 1 2 1
mediocre 5 2 3 4
none 0 1 0 0
[4 rows x 4 columns]
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。