为什么熊猫ewm与时俱进如此缓慢？

如何解决为什么熊猫ewm与时俱进如此缓慢？

假设我有以下数据：

import pandas as pd
import numpy as np
import datetime as dt

idx = pd.date_range("2010/01/01","2020/01/01",freq='1T')
n = len(idx)

data = pd.DataFrame({'A': np.random.random(n),'B': np.random.random(n),'C': np.random.random(n)},index=idx)

我可以使用以下方法非常快速地计算半衰期为1小时的指数移动平均值：

data.ewm(halflife=60).mean()

但是，如果我尝试：

data.ewm(halflife=dt.timedelta(hours=1),times=data.index).mean()

这非常慢（到退出代码为止）。为什么会这样？

解决方法

我也注意到了同样的事情，不知道为什么。 timedelta 方法在我的笔记本电脑上慢了大约 4000 倍。如果时间步长不一致，两种方法将不会产生相同的结果，见下文

import pandas as pd
import numpy as np
import datetime as dt
import time

idx = pd.date_range("2019/12/24","2020/01/01",freq='1T')
n = len(idx)
rand_hours = pd.to_timedelta(np.random.random(n) / 3.0,unit='h')
idx += rand_hours

data = pd.DataFrame({'A': np.random.random(n),'B': np.random.random(n),'C': np.random.random(n)},index=idx)

start = time.process_time()
v1 = data.ewm(halflife=60).mean()
t1 = time.process_time() - start
print('T1',t1)

start = time.process_time()
v2 = data.ewm(halflife=dt.timedelta(hours=1),times=data.index).mean()
t2 = time.process_time() - start
print('T2',t2,'scale',t2 / t1)

print(0.5 * (v1 - v2) / (v1 + v2))

生产

T1 0.001174116999999919
T2 4.715054932 scale 4015.830562031148
                                      A         B         C
2019-12-24 00:19:07.834616400  0.000000  0.000000  0.000000
2019-12-24 00:07:25.226215200 -0.005825  0.017740  0.000962
2019-12-24 00:16:53.113740000 -0.003800  0.008667  0.000355
2019-12-24 00:15:56.813227200 -0.002508  0.006256  0.000556
2019-12-24 00:04:50.909022000 -0.006851  0.007318 -0.000670
...                                 ...       ...       ...
2020-01-01 00:04:22.018974000 -0.000450 -0.000508  0.001395
2020-01-01 00:10:42.774960000 -0.000348 -0.000437  0.001404
2020-01-01 00:13:37.267552799 -0.000231 -0.000319  0.001293
2020-01-01 00:09:07.053290400 -0.000228 -0.000314  0.001287
2020-01-01 00:07:54.683781599 -0.000180 -0.000353  0.001279

[11521 rows x 3 columns]