微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

以固定间隔在 Python 中前向填充重采样数据

如何解决以固定间隔在 Python 中前向填充重采样数据

我想根据以前的值(即,使用前向填充 data)以 10second 的间隔重新采样列 ffill

数据框 df 看起来像这样:

        Timestamp               data
850812  2011-01-26 17:53:39.250 28.5
394354  2011-01-26 17:53:42.250 NaN
554123  2011-01-26 17:54:09.400 NaN
1187196 2011-01-26 17:54:19.400 NaN
1067598 2011-01-26 17:54:21.400 NaN
463998  2011-01-26 17:55:34.030 29.5
231116  2011-01-26 17:56:26.030 30.5
567915  2011-01-26 17:56:35.030 30.5
839526  2011-01-26 17:56:37.030 30.5
174655  2011-01-26 17:56:41.590 29.0

可重现的例子:

from pandas import Timestamp
from numpy import nan

df = pd.DataFrame({'Timestamp': {850812: Timestamp('2011-01-26 17:53:39.250000'),394354: Timestamp('2011-01-26 17:53:42.250000'),554123: Timestamp('2011-01-26 17:54:09.400000'),1187196: Timestamp('2011-01-26 17:54:19.400000'),1067598: Timestamp('2011-01-26 17:54:21.400000'),463998: Timestamp('2011-01-26 17:55:34.030000'),231116: Timestamp('2011-01-26 17:56:26.030000'),567915: Timestamp('2011-01-26 17:56:35.030000'),839526: Timestamp('2011-01-26 17:56:37.030000'),174655: Timestamp('2011-01-26 17:56:41.590000')},'data': {850812: 28.5,394354: nan,554123: nan,1187196: nan,1067598: nan,463998: 29.5,231116: 30.5,567915: 30.5,839526: 30.5,174655: 29.0}}
)

我试过了:

df1 = (df.set_index('Timestamp')['data']
                .resample('10S')
                .last()
                .ffill()
                .reset_index())
df1

返回:

    Timestamp           data
0   2011-01-26 17:53:30 28.5
1   2011-01-26 17:53:40 28.5
2   2011-01-26 17:53:50 28.5
3   2011-01-26 17:54:00 28.5
4   2011-01-26 17:54:10 28.5
5   2011-01-26 17:54:20 28.5
6   2011-01-26 17:54:30 28.5
7   2011-01-26 17:54:40 28.5
8   2011-01-26 17:54:50 28.5
9   2011-01-26 17:55:00 28.5
10  2011-01-26 17:55:10 28.5
11  2011-01-26 17:55:20 28.5
12  2011-01-26 17:55:30 29.5  # Should be 28.5
13  2011-01-26 17:55:40 29.5
14  2011-01-26 17:55:50 29.5
15  2011-01-26 17:56:00 29.5
16  2011-01-26 17:56:10 29.5
17  2011-01-26 17:56:20 30.5  # Should be 29.5
18  2011-01-26 17:56:30 30.5
19  2011-01-26 17:56:40 29.0  # Should be 30.5

在表格右侧的注释中,我标记了应该不同的边际值。我想在重新采样数据时复制最后一个数据,而不是下一个最近的数据。为什么要取下一个最近的数据?

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。