from io import StringIO
import pandas as pd
audit_trail = StringIO('''
course_id AcademicYear_to months TotalFee Gender
260 2017 24 100 male
260 2018 12 140 male
274 2016 36 300 mail
274 2017 24 340 female
274 2018 12 200 animal
285 2017 24 300 bird
285 2018 12 200 maela
''')
df11 = pd.read_csv(audit_trail, sep=" " )
我可以使用字典来纠正拼写错误.
corrections={'mail':'male', 'mael':'male', 'maae':'male'}
df11.Gender.replace(corrections)
但我正在寻找一种方法,仅保留男性/女性和“其他”类别作为其余选项.预期产量:
0 male
1 male
2 male
3 female
4 other
5 other
6 male
Name: Gender, dtype: object
解决方法:
在您的更正字典中添加另外两个虚拟条目:
corrections = {'male' : 'male', # dummy entry for male
'female' : 'female', # dummy entry for female
'mail' : 'male',
'maela' : 'male',
'maae' : 'male'}
现在,使用map和fillna:
df11.Gender = df11.Gender.map(corrections).fillna('other')
df11
course_id AcademicYear_to months TotalFee Gender
0 260 2017 24 100 male
1 260 2018 12 140 male
2 274 2016 36 300 male
3 274 2017 24 340 female
4 274 2018 12 200 other
5 285 2017 24 300 other
6 285 2018 12 200 male
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。