Pandas 多索引解堆栈为单行

如何解决Pandas 多索引解堆栈为单行

我很擅长简单的 Pandas，但在数据重塑和多索引方面很挣扎。我有一个多索引数据框，看起来像这样（它不一定是多索引，但它似乎是正确的做法）

姓名	索引	f1	f2	f3	calc1	calc2	calc3
狐狸	1	红色	白色	毛皮	0.21	1.67	-0.34
	2				0.76	2.20	-1.02
	3				0.01	1.12	-0.22
鸡	1	白色	黄色	羽毛	0.04	1.18	-2.01
	2				0.18	0.73	-1.21
谷物	1	黄色	包	玉米	0.89	1.65	-1.03
	2				0.34	2.45	-0.45
	3				0.87	1.11	-0.97

我想要的是：

姓名	f1	f2	f3	calc1_1	calc2_1	calc3_1	calc1_2	calc2_2	calc3_2	calc1_3	calc2_3	calc3_3
狐狸	红色	白色	毛皮	0.21	1.67	-0.34	0.76	2.20	-1.02	0.01	1.12	-0.22
鸡	白色	黄色	羽毛	0.04	1.18	-2.01	0.18	0.73	-1.21	NaN	NaN	NaN
谷物	黄色	包	玉米	0.89	1.65	-1.03	0.34	2.45	-0.45	0.87	1.11	-0.97

我想这对大熊猫大师来说一定是一件容易的事。感谢大家的帮助！！

画画

解决方法

尝试使用 set_index + unstack 来重塑为长格式

new_df = df.set_index(['name','index','f1','f2','f3']).unstack('index')

或通过pivot

new_df = df.pivot(index=['name','f3'],columns='index')

使用 sort_index 对 MultiIndex 进行排序：

new_df = new_df.sort_index(axis=1,level=1)

然后通过 map + reset_index 减少 MultiIndex：

new_df.columns = new_df.columns.map(lambda s: '_'.join(map(str,s)))

new_df = new_df.reset_index()

new_df：

      name      f1      f2        f3  calc1_1  calc2_1  calc3_1  calc1_2  calc2_2  calc3_2  calc1_3  calc2_3  calc3_3
0  chicken   white  yellow  feathers     0.04     1.18    -2.01     0.18     0.73    -1.21      NaN      NaN      NaN
1      fox     red   white       fur     0.21     1.67    -0.34     0.76     2.20    -1.02     0.01     1.12    -0.22
2    grain  yellow     bag      corn     0.89     1.65    -1.03     0.34     2.45    -0.45     0.87     1.11    -0.97

完整代码：

import pandas as pd

df = pd.DataFrame({
    'name': ['fox','fox','chicken','grain','grain'],'index': [1,2,3,1,3],'f1': ['red','red','white','yellow','yellow'],'f2': ['white','bag','bag'],'f3': ['fur','fur','feathers','corn','corn'],'calc1': [0.21,0.76,0.01,0.04,0.18,0.89,0.34,0.87],'calc2': [1.67,2.2,1.12,1.18,0.73,1.65,2.45,1.11],'calc3': [-0.34,-1.02,-0.22,-2.01,-1.21,-1.03,-0.45,-0.97]
})

new_df = (
    df.set_index(['name','f3'])
        .unstack('index')
        .sort_index(axis=1,level=1)
)

new_df.columns = new_df.columns.map(lambda s: '_'.join(map(str,s)))

new_df = new_df.reset_index()