如何解决'groupby.quantile' 不能像数组一样作为参数?
我试图在我的数据帧的某个列上计算多个百分位数,但是当我将百分位数列表作为参数传递时,我的程序崩溃了。我发现使用“for”循环解决了这个问题,但我认为它比将列表直接传递给 quantile() 方法要慢得多。
如何使这些计算更快?
这是一个可重复的示例:(请注意,我必须定义一个 Quantile 函数,否则直接与它聚合将不起作用)
import pandas as pd
import numpy as np
import time
import datetime
import random
Timer_S = time.time()
class Quantile:
def __init__(self,q):
self.q = q
def __call__(self,x):
return x.quantile(self.q,interpolation= 'lower')
new_order = ['January','February','march','April','May','June','July','August','September','October','November','December']
percentiles = [0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,0.99]
df = pd.DataFrame({"Start": pd.date_range("1-jan-2021",periods=10**5,freq="1H")})
df['Rand'] = np.random.randint(0,10,df.shape[0])
list_P = []
Quantiles_df = df.copy()
Quantiles_df['Month'] = Quantiles_df['Start'].dt.strftime('%B')
for element in percentiles:
k = Quantiles_df.groupby(['Month']).agg({'Rand' : Quantile(element)})
k = k.reindex(new_order,axis = 0)
list_P.append(k)
Final_df = pd.concat(list_P,axis=1)
Final_df.columns = [f'P_{int(element*100)}' for element in percentiles]
Timer_E = time.time()
display(Final_df)
print(f'Quantile timer : {Timer_E - Timer_S} secs')
解决方法
你能试试这个而不是循环吗?首先 groupby
和 agg
使用多个 quantiles
。然后 pivot_table
将结果拆开。
pd.pivot_table(Quantiles_df.groupby("Month").quantile([0.1,0.2,0.3,0.4,0.5,0.6]).reset_index(),index='Month',columns='level_1').reset_index().droplevel(level=0,axis=1)
我得到了这个
level_1 0.1 0.2 0.3 0.4 0.5 0.6
0 April 0.0 1.0 2.0 4.0 5.0 5.0
1 August 0.0 2.0 2.0 4.0 5.0 6.0
2 December 0.0 1.0 3.0 4.0 4.5 6.0
3 February 1.0 2.0 3.0 3.0 4.0 5.0
4 January 1.0 2.0 3.0 4.0 5.0 6.0
5 July 0.0 2.0 3.0 4.0 4.0 5.0
6 June 1.0 1.0 3.0 3.0 4.0 5.0
7 March 0.0 1.0 2.0 3.0 4.0 5.0
8 May 1.0 2.0 3.0 4.0 5.0 6.0
9 November 0.9 2.0 3.0 4.0 5.0 5.0
10 October 0.0 1.0 2.0 3.0 4.0 6.0
11 September 0.0 1.0 2.0 4.0 5.0 6.0
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。