在我拥有的数据中,一些特征值是?.如何用NA代替它们?
编辑
df = pd.read_csv("cca-census-income.csv", header = None)
df.replace('?', np.nan, inplace=True)
df.ix[0,]
23 Other relative of householder
24 1700.09
25 ?
26 ?
27 ?
28 Not in universe under 1 year old
29 ?
30 0
解决方法:
样品:
import pandas as pd
import io
temp=u"""Date Time,a
2010-01-27 16:00:00,?
2010-01-27 16:10:00,2.2
2010-01-27 16:30:00,1.7"""
df = pd.read_csv(io.StringIO(temp),na_values='?')
print (df)
Date Time a
0 2010-01-27 16:00:00 NaN
1 2010-01-27 16:10:00 2.2
2 2010-01-27 16:30:00 1.7
编辑:
谢谢‘shivsn’的建议,添加skipinitialspace = True:
temp=u"""Date Time,a
? , ?
? ,?
2010-01-27 16:30:00,1.7"""
df = pd.read_csv(io.StringIO(temp),na_values=['?', '? '], skipinitialspace =True)
print (df)
Date Time a
0 NaN NaN
1 NaN NaN
2 2010-01-27 16:30:00 1.7
EDIT1按文件:
似乎前面没有空格?:
df = pd.read_csv('census-income.data',
header = None,
na_values=['?'],
skipinitialspace =True)
print (df)
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。