微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

使用两级分组计算随时间累积的发生次数

如何解决使用两级分组计算随时间累积的发生次数

我有一个如下所示的数据集:

    country            date_added
0   United States       01/2013
1   United Kingdom      03/2014
2   Egypt               03/2014
3   United States       03/2014
4   United States       03/2014
5   United Kingdom      06/2015
6   United States       06/2015

我想按日期计算每个国家/地区的累计总数,即:

    date_added         country         cumulative_count
0   01/2013             United States          1
1   03/2014             United Kingdom         1
2   03/2014             Egypt                  1
3   03/2014             United States          2
4   06/2015             United Kingdom         2
5   06/2015             United States          4

我尝试了 grouping by two levels 但 .count() 不起作用(计数根本不显示)而 .size() 起作用:

cumulative_by_date = new_df.groupby(['date_added','country']).size()

我不知道如何将 this question's solution 与 .size() 一起应用以获得累积总和。

解决方法

按照第二个链接问题的方法,这是一个带有 groupbycumsum 的双 reset_index

df.groupby(['date_added','country']).size()
  .groupby(['country']).cumsum().reset_index(name='cumulative_count')

输出:

  date_added         country  cumulative_count
0    01/2013   United States                 1
1    03/2014           Egypt                 1
2    03/2014  United Kingdom                 1
3    03/2014   United States                 3
4    06/2015  United Kingdom                 2
5    06/2015   United States                 4

分步骤:

# size by date and country
print(df.groupby(['date_added','country']).size())

# output
date_added  country       
01/2013     United States     1
03/2014     Egypt             1
            United Kingdom    1
            United States     2
06/2015     United Kingdom    1
            United States     1
# cumulative sum by country
print(df.groupby(['date_added','country']).size()
        .groupby(['country']).cumsum())

# output
date_added  country       
01/2013     United States     1
03/2014     Egypt             1
            United Kingdom    1
            United States     3
06/2015     United Kingdom    2
            United States     4
# reset index
print(df.groupby(['date_added','country']).size()
        .groupby(['country']).cumsum().reset_index(name='cumulative_count'))

# output
  date_added         country  cumulative_count
0    01/2013   United States                 1
1    03/2014           Egypt                 1
2    03/2014  United Kingdom                 1
3    03/2014   United States                 3
4    06/2015  United Kingdom                 2
5    06/2015   United States                 4

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。