微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

透视数据框的想法:从长到宽

如何解决透视数据框的想法:从长到宽

我有一个数据帧数据记录堆叠,其中同一主题每 3 个月左右有不同的测量。例如,Subj BAR02002 有 4 个不同的数据记录:

    Subj  months   X    Y    Z
BAR02002   0       14  53   52
BAR02002   3       24  61   96
BAR02002   6       5   53   3
BAR02002   9       3   64   33
BAR02003   0       22  63   55
BAR02003   6       44  22   53 
BAR02003   9       42  12   72
BAR02003   12      15  1    12

我试图让 BAR02002 只占一行而不是 4 行。我相信这个过程被称为从长到宽重塑数据(我可能是错的)。说明最终结果:

Subj       X    Y    Z    X1    Y2    Z3   X2   Y3   Z3  ... 
BAR02002   14   53   52   24    61    96   5    53    3  ...    
BAR02003   0    22   63   55    NA    NA   NA   44   22  ...   

以下代码没有给出我想要的。有没有办法使用 pandas/python(甚至 R)转换数据?

df.pivot(index='Subj_FU',columns='Subj',values= ['Months','PM_N',...])

解决方法

map用于新列并将其用于参数columns,最后展平MultiIndex

df['g'] = df['months'].map({0:0,3:1,6:2,9:3,12:4})
df1 = df.pivot_table(index='Subj',columns='g',values= ['X','Y','Z'],aggfunc='sum')
df1.columns = df1.columns.map(lambda x: f'{x[0]}{x[1]}')
print (df1)
            X0    X1    X2    X3    X4    Y0    Y1    Y2    Y3   Y4    Z0  \
Subj                                                                        
BAR02002  14.0  24.0   5.0   3.0   NaN  53.0  61.0  53.0  64.0  NaN  52.0   
BAR02003  22.0   NaN  44.0  42.0  15.0  63.0   NaN  22.0  12.0  1.0  55.0   

            Z1    Z2    Z3    Z4  
Subj                              
BAR02002  96.0   3.0  33.0   NaN  
BAR02003   NaN  53.0  72.0  12.0  

如果使用列 month

df1 = df.pivot_table(index='Subj',columns='months',aggfunc='sum')
df1.columns = df1.columns.map(lambda x: f'{x[0]}{x[1]}')
print (df1)
            X0    X3    X6    X9   X12    Y0    Y3    Y6    Y9  Y12    Z0  \
Subj                                                                        
BAR02002  14.0  24.0   5.0   3.0   NaN  53.0  61.0  53.0  64.0  NaN  52.0   
BAR02003  22.0   NaN  44.0  42.0  15.0  63.0   NaN  22.0  12.0  1.0  55.0   

            Z3    Z6    Z9   Z12  
Subj                              
BAR02002  96.0   3.0  33.0   NaN  
BAR02003   NaN  53.0  72.0  12.0  

或者使用Series.unstack

g = df['months'].map({0:0,12:4})
df1 = df.groupby(['Subj',g])[['X','Z']].sum().unstack()
df1.columns = df1.columns.map(lambda x: f'{x[0]}{x[1]}')
,

你可以简单地drop重复,它会保留第一项:

import pandas as pd

data = [ { "Subj": "BAR02002","months": 0,"X": 14,"Y": 53,"Z": 52 },{ "Subj": "BAR02002","months": 3,"X": 24,"Y": 61,"Z": 96 },"months": 6,"X": 5,"Z": 3 },"months": 9,"X": 3,"Y": 64,"Z": 33 },{ "Subj": "BAR02003","X": 22,"Y": 63,"Z": 55 },"X": 44,"Y": 22,"Z": 53 },"X": 42,"Y": 12,"Z": 72 },"months": 12,"X": 15,"Y": 1,"Z": 12 } ]
df = pd.DataFrame(data)

结果:

Subj X Y Z
0 BAR02002 0 14 53 52
4 BAR02003 0 22 63 55

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。