微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

数据透视表将行复制到新列中

我有一个像这样的数据框,我正在尝试使用Pandas的Pivot重塑数据框,使我可以保留原始行的一些值,同时将重复行变成列并重命名.有时我有5个重复的行

我一直在尝试,但是我不明白.

import pandas as pd
df = pd.read_csv("C:dummy")

df = df.pivot(index=["ID"], columns=["Zone","PTC"], values=["Zone","PTC"])

# Rename columns and reset the index.
df.columns = [["PTC{}","Zone{}"],.format(c) for c in df.columns]
df.reset_index(inplace=True)
# Drop duplicates
df.drop(["PTC","Zone"], axis=1, inplace=True)

输入项

ID  Agent   OV  Zone Value  PTC
1   10      26   M1   10    100
2   26.5    8    M2   50    95
2   26.5    8    M1   6     5
3   4.5     6    M3   4     40
3   4.5     6    M4   6     60
4   1.2    0.8   M1   8     100
5   2      0.4   M1   6     10
5   2      0.4   M2   41    86
5   2      0.4   M4   2     4

输出

ID  Agent   OV  Zone1   Value1  PTC1    Zone2   Value2  PTC2    Zone3   Value3  PTC3
1   10      26  M_1     10       100    0          0      0      0         0      0
2   26.5    8   M_2     50        95    M_1        6      5      0         0      0
3   4.5     6   M_3     4         40    M_4        6     60      0         0      0
4   1.2    0.8  M_1     8        100    0          0      0      0         0      0
5   2      0.4  M_1     6         10    M_2        41    86     M_4        2      4

解决方法:

cumcount用于计数组,使用unstack创建带有unstack的MultiIndex,并最后平整列的值:

g = df.groupby(["ID","Agent", "OV"]).cumcount().add(1)
df = df.set_index(["ID","Agent","OV", g]).unstack(fill_value=0).sort_index(axis=1, level=1)
df.columns = ["{}{}".format(a, b) for a, b in df.columns]

df = df.reset_index()
print (df)
   ID  Agent    OV Zone1  Value1  PTC1 Zone2  Value2  PTC2 Zone3  Value3  PTC3
0   1   10.0  26.0    M1      10   100     0       0     0     0       0     0
1   2   26.5   8.0    M2      50    95    M1       6     5     0       0     0
2   3    4.5   6.0    M3       4    40    M4       6    60     0       0     0
3   4    1.2   0.8    M1       8   100     0       0     0     0       0     0
4   5    2.0   0.4    M1       6    10    M2      41    86    M4       2     4

如果要将仅数字列替换为0:

g = df.groupby(["ID","Agent"]).cumcount().add(1)
df = df.set_index(["ID","Agent","OV", g]).unstack().sort_index(axis=1, level=1)

idx = pd.IndexSlice
df.loc[:, idx[['Value','PTC']]] = df.loc[:, idx[['Value','PTC']]].fillna(0).astype(int)
df.columns = ["{}{}".format(a, b) for a, b in df.columns]

df = df.fillna('').reset_index()
print (df)
   ID  Agent    OV Zone1  Value1  PTC1 Zone2  Value2  PTC2 Zone3  Value3  PTC3
0   1   10.0  26.0    M1      10   100             0     0             0     0
1   2   26.5   8.0    M2      50    95    M1       6     5             0     0
2   3    4.5   6.0    M3       4    40    M4       6    60             0     0
3   4    1.2   0.8    M1       8   100             0     0             0     0
4   5    2.0   0.4    M1       6    10    M2      41    86    M4       2     4

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。

相关推荐