如何解决Pandas 数据框将行值重塑为新列矩阵类型格式
我是 Pandas 的新手,正在寻找有关如何重塑数据框的建议:
panellist_id | 类型 | type_count | refer_sm_count | refer_se_count | refer_non_n_count | |
---|---|---|---|---|---|---|
1 | 惠普 | 2 | 2 | 1 | 1 | |
1 | PB | 1 | 0 | 1 | 0 | |
1 | TN | 3 | 0 | 3 | 0 | |
2 | 惠普 | 1 | 1 | 0 | 0 | |
2 | PB | 2 | 1 | 1 | 0 | 0 |
理想情况下,我希望我的数据框看起来像这样:
panellist_id | type_HP_count | type_PB_count | type_TN_count | refer_sm_count_HP | refer_se_count_HP | refer_non_n_count_HP | refer_sm_count_PB | refer_se_count_PB | refer_non_n_count_PB | refer_sm_count_TN | refer_se_count_TN | refer_non_n_count_TN |
---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 2 | 1 | 3 | 2 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 |
2 | 1 | 2 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 |
基本上,我需要将“类型”列中的不同行值转换为新列,显示每种类型的计数。原始 df 标题为“引用”的接下来三列需要考虑每种不同的“类型”。例如,refers_sm_count_[来自类型X(例如HP)]。任何帮助将非常感激。谢谢
解决方法
通过 pivot_table()
和 rename_axis()
方法尝试:
out=(df.pivot_table(index='panelist_id',columns='type',fill_value=0)
.rename_axis(columns=[None,None],index=None))
最后使用map()
方法和.columns
属性:
out.columns=out.columns.map('_'.join)
现在如果你打印 out
你会得到你想要的输出
通过 pivot_wider
的 pyjanitor 选项:
new_df = df.pivot_wider(index='panelist_id',names_from='type',names_from_position='last',fill_value=0)
new_df
:
panelist_id type_count_HP type_count_PB type_count_TN refer_sm_count_HP refer_sm_count_PB refer_sm_count_TN refer_se_count_HP refer_se_count_PB refer_se_count_TN refer_non_n_count_HP refer_non_n_count_PB refer_non_n_count_TN
1 2 1 3 2 0 0 1 1 3 1 0 0
2 1 2 0 1 1 0 0 1 0 0 0 0
完整的工作示例:
import janitor
import pandas as pd
df = pd.DataFrame({
'panelist_id': [1,1,2,2],'type': ['HP','PB','TN','HP','PB'],'type_count': [2,3,'refer_sm_count': [2,1],'refer_se_count': [1,'refer_non_n_count': [1,0]
})
new_df = df.pivot_wider(index='panelist_id',fill_value=0)
print(new_df.to_string(index=False))
,
再添加一个选项:
df = df.set_index(['panelist_id','type']).unstack(-1,fill_value=0)
df.columns = df.columns.map('_'.join)
,
使用pivot_table创建多索引
df_p = df.pivot_table(index='panelist_id',aggfunc=sum)
refer_non_n_count refer_se_count \
type HP PB TN HP PB TN
panelist_id
1 1.0 0.0 0.0 1.0 1.0 3.0
2 0.0 0.0 NaN 0.0 1.0 NaN
refer_sm_count type_count
type HP PB TN HP PB TN
panelist_id
1 2.0 0.0 0.0 2.0 1.0 3.0
2 1.0 1.0 NaN 1.0 2.0 NaN
如果您确实想展平列,则
df_p.columns = ['_'.join(col) for col in df_p.columns.values]
,
首先,导入库:
import numpy as np
import pandas as pd
然后,读取您的数据:
data = pd.read_excel('base.xlsx')
使用 pivot_table 重塑您的数据:
data_reshaped = pd.pivot_table(data,values=['type_count','refer_sm_count','refer_se_count','refer_non_n_count'],index=['panelist_id'],columns=['type'],aggfunc=np.sum)
但是,您的索引不会很好。所以,然后重置:
columns = [data_reshaped.columns[i][0] + '_' + data_reshaped.columns[i][1]
for i in range(len(data_reshaped.columns))] # to create new columns names
data_reshaped.columns = columns # to assign new columns names to dataframe
data_reshaped.reset_index(inplace=True) # to reset index
data_reshaped.fillna(0,inplace=True) # to substitute nan to 0
然后,你的数据就会很好
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。