如何解决编码的目标列只显示一个类别?
我正在研究多类分类问题。我的目标列有 4 个级别,分别为低、中、高和非常高。当我尝试对其进行编码时,我只得到 0 作为 value_counts()。我不确定,为什么。
value count in original data frame is :
High 18767
Very High 15856
Medium 9212
Low 5067
Name: physician_segment,dtype: int64
我尝试了以下方法来编码我的目标列:
Using replace() method :
target_enc = {'Low':0,'Medium':1,'High':2,'Very High':3}
df1['physician_segment'] = df1['physician_segment'].astype(object)
df1['physician_segment'] = df1['physician_segment'].replace(target_enc)
df1['physician_segment'].value_counts()
0 48902
Name: physician_segment,dtype: int64
using factorize method():
from pandas.api.types import CategoricalDtype
df1['physician_segment'] = df1['physician_segment'].factorize()[0]
df1['physician_segment'].value_counts()
0 48902
Name: physician_segment,dtype: int64
Using Label Encoder :
from sklearn import preprocessing
labelencoder= LabelEncoder()
df1['physician_segment'] = labelencoder.fit_transform(df1['physician_segment']) df1['physician_segment'].value_counts()
0 48902
Name: physician_segment,dtype: int64
在所有这三种技术中,我只得到一个类为 0,数据帧的长度为 48902。
有人可以指出,我做错了什么。 我希望目标列的值为 0、1、2、3。
解决方法
target_enc = {'Low':0,'Medium':1,'High':2,'Very High':3}
df1['physician_segment'] = df1['physician_segment'].astype(object)
之后创建/定义一个函数:-
def func(val):
if val in target_enc.keys():
return target_enc[val]
最后使用 apply()
方法:-
df1['physician_segment']=df1['physician_segment'].apply(func)
现在如果你打印 df1['physician_segment'].value_counts()
你会得到正确的输出
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。