我有和pandas数据帧的multiindex看起来像这样:
# -*- coding: utf-8 -*- import numpy as np import pandas as pd # multi-indexed dataframe df = pd.DataFrame(np.random.randn(8760 * 3,3)) df['concept'] = "some_value" df['datetime'] = pd.date_range(start='2016',periods=len(df),freq='60Min') df.set_index(['concept','datetime'],inplace=True) df.sort_index(inplace=True)
控制台输出:
df.head() Out[23]: 0 1 2 datetime 2016 0.458802 0.413004 0.091056 2016 -0.051840 -1.780310 -0.304122 2016 -1.119973 0.954591 0.279049 2016 -0.691850 -0.489335 0.554272 2016 -1.278834 -1.292012 -0.637931 df.head() ...: df.tail() Out[24]: 0 1 2 datetime 2018 -1.872155 0.434520 -0.526520 2018 0.345213 0.989475 -0.892028 2018 -0.162491 0.908121 -0.993499 2018 -1.094727 0.307312 0.515041 2018 -0.880608 -1.065203 -1.438645
现在我想在’datetime’级别创建年度总和.
我的第一次尝试是以下,但这不起作用:
# sum along years years = df.index.get_level_values('datetime').year.tolist() df.index.set_levels([years],level=['datetime'],inplace=True) df = df.groupby(level=['datetime']).sum()
这对我来说似乎也很沉重,因为这个任务可能很容易实现.
所以这是我的问题:如何获得“日期时间”级别的年度总和?有没有一种简单的方法来通过将函数应用于DateTime级别值来实现这一点?
解决方法
您可以通过第二级multiindex和
year
获得
groupby
:
# -*- coding: utf-8 -*- import numpy as np import pandas as pd # multi-indexed dataframe df = pd.DataFrame(np.random.randn(8760 * 3,inplace=True) df.sort_index(inplace=True) print df.head() 0 1 2 concept datetime some_value 2016-01-01 00:00:00 1.973437 0.101535 -0.693360 2016-01-01 01:00:00 1.221657 -1.983806 -0.075609 2016-01-01 02:00:00 -0.208122 -2.203801 1.254084 2016-01-01 03:00:00 0.694332 -0.235864 0.538468 2016-01-01 04:00:00 -0.928815 -1.417445 1.534218 # sum along years #years = df.index.get_level_values('datetime').year.tolist() #df.index.set_levels([years],inplace=True) print df.index.levels[1].year [2016 2016 2016 ...,2018 2018 2018] df = df.groupby(df.index.levels[1].year).sum() print df.head() 0 1 2 2016 -93.901914 -32.205514 -22.460965 2017 205.681817 67.701669 -33.960801 2018 67.438355 150.954614 -21.381809
或者您可以使用get_level_values
和year
:
df = df.groupby(df.index.get_level_values('datetime').year).sum() print df.head() 0 1 2 2016 -93.901914 -32.205514 -22.460965 2017 205.681817 67.701669 -33.960801 2018 67.438355 150.954614 -21.381809
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。