如何解决在没有任何聚合的情况下在数据框中传播特定列?
这是我的玩具 df:
{'id': {0: 1089577,1: 1089577,2: 1089577,3: 1089577,4: 1089577},'title': {0: 'Hungarian Goulash Stew',1: 'Hungarian Goulash Stew',2: 'Hungarian Goulash Stew',3: 'Hungarian Goulash Stew',4: 'Hungarian Goulash Stew'},'readyInMinutes': {0: 120,1: 120,2: 120,3: 120,4: 120},'nutrients.amount': {0: 323.18,1: 15.14,2: 4.43,3: 38.95,4: 34.64},'nutrients.name': {0: 'Calories',1: 'Fat',2: 'Saturated Fat',3: 'Carbohydrates',4: 'Net Carbohydrates'},'nutrients.percentOfDailyNeeds': {0: 16.16,1: 23.3,2: 27.69,3: 12.98,4: 12.6},'nutrients.title': {0: 'Calories','nutrients.unit': {0: 'kcal',1: 'g',2: 'g',3: 'g',4: 'g'}}
我想将 nutrients.title
展开为列。 Sp I 将得到 Fat,Saturated Fat ... 列及其对应的值,没有任何 agg。
没有任何聚合就可以做到这一点的函数是什么?只是“重塑”。
我怎么能像这样“传播”它?
解决方法
尝试pivot_table:
# Rename Columns
df.columns = df.columns.map(lambda x: f".{x.split('.')[-1]}" if '.' in x else x)
# Create Pivot Table
df = df.pivot_table(
index=['id','title','readyInMinutes'],columns=['.title'],values=['.amount','.percentOfDailyNeeds','.unit'],aggfunc='first'
).reset_index() \
.swaplevel(0,1,axis=1)
# Re-Order Columns So that nutrients.title are grouped
df = df.reindex(sorted(df.columns),axis=1)
# Reduce Levels by join
df.columns = df.columns.map(''.join)
print(df.to_string(index=False))
输出:
id readyInMinutes title Calories.amount Calories.percentOfDailyNeeds Calories.unit Carbohydrates.amount Carbohydrates.percentOfDailyNeeds Carbohydrates.unit Fat.amount Fat.percentOfDailyNeeds Fat.unit Net Carbohydrates.amount Net Carbohydrates.percentOfDailyNeeds Net Carbohydrates.unit Saturated Fat.amount Saturated Fat.percentOfDailyNeeds Saturated Fat.unit 1089577 120 Hungarian Goulash Stew 323.18 16.16 kcal 38.95 12.98 g 15.14 23.3 g 34.64 12.6 g 4.43 27.69 g
带有删节输出的步骤
- 更改列名称:
print(df.columns.values)
# ['id' 'title' 'readyInMinutes' 'nutrients.amount' 'nutrients.name'
# 'nutrients.percentOfDailyNeeds' 'nutrients.title' 'nutrients.unit']
print(df.columns.map(lambda x: f".{x.split('.')[-1]}" if '.' in x else x).values)
# ['id' 'title' 'readyInMinutes' '.amount' '.name' '.percentOfDailyNeeds'
# '.title' '.unit']
- 在具有单个标题列的多个值列上透视以创建多级列索引:
print(df.pivot_table(
index=['id',aggfunc='first'
).to_string())
.amount .title Calories Carbohydrates Fat Net Carbohydrates Saturated Fat id title readyInMinutes 1089577 Hungarian Goulash Stew 120 323.18 38.95 15.14 34.64 4.43
- 修复索引和交换级别,使标签位于顶部(
Calories
、Carbohydrates
等).reset_index().swaplevel(0,axis=1)
.title Calories Carbohydrates Fat Net Carbohydrates Saturated Fat id title readyInMinutes .amount .amount .amount .amount .amount 0 1089577 Hungarian Goulash Stew 120 323.18 38.95 15.14 34.64 4.43
- 对列进行排序,使标签放在一起:
df = df.reindex(sorted(df.columns),axis=1)
.title Calories Carbohydrates id readyInMinutes title .amount .percentOfDailyNeeds .unit .amount .percentOfDailyNeeds .unit 0 1089577 120 Hungarian Goulash Stew 323.18 16.16 kcal 38.95 12.98 g
- 使用 join 减少级别(创建
Calories.amount
、Calories.unit
等)
df.columns = df.columns.map(''.join)
id readyInMinutes title Calories.amount Calories.percentOfDailyNeeds Calories.unit 0 1089577 120 Hungarian Goulash Stew 323.18 16.16 kcal,
您可以按如下方式使用 df.pivot()
:
(df.pivot(index=['id',columns='nutrients.title',values='nutrients.amount')
.rename_axis(None,axis=1)
).reset_index()
结果:
id title readyInMinutes Calories Carbohydrates Fat Net Carbohydrates Saturated Fat
0 1089577 Hungarian Goulash Stew 120 323.18 38.95 15.14 34.64 4.43
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。