如何解决将单列的“列属性”合并到单独的列中,以降低该单列的虚拟变量数量
如果一个列有 14 个不同的 [Unique Values]value_counts(),并且它们有一些共同点, 在我们的示例中[当我们将 'Loan.Purpose' 与 'Interest.Rate' 列分组,并根据 Loan.Purpose mean() 值计算每个 [Unique Values]value_counts() 的平均值时],我们得到了某个共同的平均利率对于某些 value_counts,例如 :-('car','educational','major_purchase') 属性的均值 = 11.0,现在我想合并上面提到的 ('car','major_purchase') [唯一值]value_counts(),在 column_name "LP_cem" 下,因为它们具有相同的意思,同样我想对其他 value_counts() 做同样的事情,
这样我就可以将虚拟变量的数量从 14 个减少到 4 个。
基本上,我想根据它们的 mean() 在 3/4 列下合并 14 个不同的 value_counts(),然后从这些 3/4 列中创建虚拟对象
像下面这样
LP_cem LP_chos LP_dm LP_hmvw LP_renewable_energy
0 0 0 1 0 0
1 0 0 1 0 0
2 0 0 1 0 0
3 0 0 1 0 0
4 0 1 0 0 0
raw_data['Loan.Purpose'].value_counts()
debt_consolidation 1306
credit_card 443
other 200
home_improvement 151
major_purchase 101
small_business 86
car 50
wedding 39
medical 30
moving 28
vacation 21
house 20
educational 15
renewable_energy 4
Name: Loan.Purpose,dtype: int64
我已经根据 Loan.Purpose
的平均值将 Interest.Rate
的数据分组
raw_data_8 = round(raw_data_5.groupby('Loan.Purpose')['Interest.Rate'].mean())
raw_data_8
Loan.Purpose
CHOS 15.0
DM 12.0
car 11.0
credit_card 13.0
debt_consolidation 14.0
educational 11.0
home_improvement 12.0
house 13.0
major_purchase 11.0
medical 12.0
moving 14.0
other 13.0
renewable_energy 10.0
small_business 13.0
vacation 12.0
wedding 12.0
Name: Interest.Rate,dtype: float64
现在我想将具有相同平均值的值放在一起,我什至尝试了代码,但它给出了错误
for i in range(len(raw_data_5.index)):
if raw_data_5['Loan.Purpose'][i] in ['car','major_purchase']:
raw_data_5.iloc[i,'Loan.Purpose'] = 'cem'
if raw_data_5['Loan.Purpose'][i] in ['home_improvement','medical','vacation','wedding']:
raw_data_5.iloc[i,'Loan.Purpose'] = 'hmvw'
if raw_data_5['Loan.Purpose'][i] in ['credit_care','house','other','small_business']:
raw_data_5.iloc[i,'Loan.Purpose'] = 'chos'
if raw_data_5['Loan.Purpose'][i] in ['debt_consolidation','moving']:
raw_data_5.iloc[i,'Loan.Purpose'] = 'dcm'
error = TypeError Traceback (most recent
call last)
<ipython-input-51-cf7ef2ae1efd> in <module>
----> 1 for i in range(raw_data_5.index):
2 if raw_data_5['Loan.Purpose'][i] in ['car','major_purchase']:
3 raw_data_5.iloc[i,'Loan.Purpose'] = 'cem'
4 if raw_data_5['Loan.Purpose'][i] in ['home_improvement','wedding']:
5 raw_data_5.iloc[i,'Loan.Purpose'] = 'hmvw'
TypeError: 'Int64Index' object cannot be interpreted as an integer
Interest.Rate Loan.Length Loan.Purpose
0 8.90 36.0 debt_consolidation
1 12.12 36.0 debt_consolidation
2 21.98 60.0 debt_consolidation
3 9.99 36.0 debt_consolidation
4 11.71 36.0 credit_card
5 15.31 36.0 other
6 7.90 36.0 debt_consolidation
7 17.14 60.0 credit_card
8 14.33 36.0 credit_card
10 19.72 36.0 moving
11 14.27 36.0 debt_consolidation
12 21.67 60.0 debt_consolidation
13 8.90 36.0 debt_consolidation
14 7.62 36.0 debt_consolidation
15 15.65 60.0 debt_consolidation
16 12.12 36.0 debt_consolidation
17 10.37 60.0 debt_consolidation
18 9.76 36.0 credit_card
19 9.99 60.0 debt_consolidation
20 21.98 36.0 debt_consolidation
21 19.05 60.0 credit_card
22 17.99 60.0 car
23 11.99 36.0 credit_card
24 16.82 60.0 vacation
25 7.90 36.0 debt_consolidation
26 14.42 36.0 debt_consolidation
27 15.31 36.0 debt_consolidation
28 8.59 36.0 other
29 7.90 36.0 debt_consolidation
30 21.00 60.0 debt_consolidation
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。