如何在数据框中按列分组并在循环中创建数据透视表

如何解决如何在数据框中按列分组并在循环中创建数据透视表

我有下表df。

ID  CATEG   LEVEL   COLS    VALUE   COMMENT
1    A       3      Apple    388    comment1
1    A       3      Orange   204    comment1
1    A       2      Orange   322    comment1
1    A       1      Orange   716    comment1
1    A       1      Apple    282    comment1
1    A       2      Apple    555    comment1
1    A              Berry    289    comment1
2    A              Car      316    comment1
1    B              Berry    297    comment1
1    B       3      Apple    756    comment1
1    B       2      Apple    460    comment1
1    B       3      Orange   497    comment1
1    B       2      Orange   831    comment1
1    B       1      Orange   225    comment1
1    B       1      Apple    395    comment1
2    B              Car      486    comment1
1    C       2      Orange   320    comment1
1    C       1      Orange   208    comment1
1    C       1      Apple    464    comment1
1    C       2      Apple    613    comment1
1    C       3      Apple    369    comment1
1    C              Berry    474    comment1
2    C              Car      888    comment1
1    C       3      Orange   345    comment1
2    B              Car      664    comment2

我想在dataframe中创建此视图，并为每个ID组写excel。ID 1的示例。在我的示例中，只有一个注释，因此工作表名称类似于ID_COMMENT，类似于1_comment1：-

  Berry     Apple     Orange        
         1   2   3  1   2   3
A   289 388 555 282 204 322 716
B   297 756 460 395 497 831 225
C   474 369 613 464 345 320 208

如果LEVEL是None/na，我应该能够基于df和COLS单独创建/拆分comments，并将名称“ ID_NULL_COMMENT”作为工作表像这样的名字： 2_NULL_comment1工作表：-

   CATEG    Car
     A      316
     B      486
     C      888

2_NULL_comment2工作表：-

CATEG   Car
 B      664

我尝试了什么：

from pandas import ExcelWriter
writer = ExcelWriter('Values.xlsx')
distinct_id_df= np.unique(df[['ID']],axis=0)   
for ID in  distinct_id_df.iloc[:,0] :
    sample_df = pd.DataFrame()
    for df in sample_df:
        for i in(distinct_id_df):
            distinct_id_df = df.groupby['ID'].pivot_table('VALUE',['LEVEL','CATEEG'],'COLS')
        sample_df = sample_df.append(df)
        print(sample_df.shape,'===>',datetime.Now())
    sample_df.to_excel(writer,'{}''{}'.format(id).format(comments),index= False)

writer.save()

这显然是不正确的，我无法正确执行pivot，并且还停留在如何正确循环以放置在不同纸张上的问题上。

解决方法

使用：

df = pd.DataFrame({'ID': [1,1,2,2],'CATEG': ['A','A','B','C','B'],'LEVEL': [3.0,3.0,2.0,1.0,np.nan,np.nan],'COLS': ['Apple','Orange','Apple','Berry','Car','Car'],'VALUE': [388,204,322,716,282,555,289,316,297,756,460,497,831,225,395,486,320,208,464,613,369,474,888,345,664],'COMMENT': ['comment1','comment1','comment2']})

#check misisng values
mask = df['LEVEL'].isna()

#split DataFrames for different processing
df1 = df[~mask]
df2 = df[mask]

#pivoting with differnet columns parameters
df1 = df1.pivot_table(index=['ID','COMMENT','CATEG'],columns=['COLS','LEVEL'],values='VALUE')
# print (df1)

df2 = df2.pivot_table(index=['ID',columns='COLS',values='VALUE')
# print (df1)

from pandas import ExcelWriter
with pd.ExcelWriter('Values.xlsx') as writer: 
    
    #groupby by first 2 levels ID,COMMENT
    for (ids,comments),sample_df in df1.groupby(['ID','COMMENT']):
        #removed first 2 levels,also removed only NaNs columns
        df = sample_df.reset_index(level=[1],drop=True).dropna(how='all',axis=1)
        #new sheetnames by f-strings
        name = f'{ids}_{comments}'
        #write to file
        df.to_excel(writer,sheet_name=name)
        
    for (ids,sample_df in df2.groupby(['ID','COMMENT']):
        df = sample_df.reset_index(level=[1],axis=1)
        name = f'{ids}_NULL_{comments}'
        df.to_excel(writer,sheet_name=name)

另一种无需重复代码的解决方案：

mask = df['LEVEL'].isna()

dfs = {'no_null': df[~mask],'null': df[mask]}

from pandas import ExcelWriter
with pd.ExcelWriter('Values.xlsx') as writer: 
    
    for k,v in dfs.items():
        if k == 'no_null':
            add = ''
            cols = ['COLS','LEVEL']
        else:
             add = 'NULL_'
             cols = 'COLS'
        
        df = v.pivot_table(index=['ID',columns=cols,values='VALUE')
          
        for (ids,sample_df in df.groupby(['ID','COMMENT']):
            df = sample_df.reset_index(level=[1],axis=1)
            name = f'{ids}_{add}{comments}'
            df.to_excel(writer,sheet_name=name)