如何解决Python:scipy.sparse / pandas 稀疏矩阵中的空值被转换为大的负整数
我正在尝试使用 scipy 稀疏 COO 矩阵,但我遇到了奇怪的错误,空值被转换为大的负整数。这是我在做什么:
import pickle5 as pk5
from scipy import sparse
import pandas as pd
with open('some_file.pickle','rb') as f:
df = pk5.load(f)
原始稀疏 df 看起来是正确的:
df.iloc[0:5,0:4])
:
1028799.3_nuc_coding 1156994.3_nuc_coding 1156995.3_nuc_coding
0 1.0 NaN NaN
1 NaN 1.0 NaN
2 NaN NaN NaN
3 NaN NaN NaN
4 NaN NaN NaN
运行 dropna 工作正常,所以它实际上是空值。
df.iloc[0].dropna().index[:3]
Index(['1028799.3_nuc_coding','1280.11650_nuc_coding','1280.11655_nuc_coding'],dtype='object')
但是对其执行任何操作都会将 NaN 值更改为 -9223372036854775808。例如这里是df.T
:
0 1 \
1028799.3_nuc_coding 1 -9223372036854775808
1156994.3_nuc_coding -9223372036854775808 1
1156995.3_nuc_coding -9223372036854775808 -9223372036854775808
2 3 \
1028799.3_nuc_coding -9223372036854775808 -9223372036854775808
1156994.3_nuc_coding -9223372036854775808 -9223372036854775808
1156995.3_nuc_coding -9223372036854775808 -9223372036854775808
4
1028799.3_nuc_coding -9223372036854775808
1156994.3_nuc_coding -9223372036854775808
1156995.3_nuc_coding -9223372036854775808
我在 df.iterrows() 和使用上面的代码在 scipy 中转换到 coo 矩阵时遇到了类似的错误。
coo_mat = sparse.coo_matrix(df.values,shape=df.shape)
print(coo_mat)
(0,0) 1
(0,1) -9223372036854775808
(0,2) -9223372036854775808
(0,3) -9223372036854775808
(0,4) -9223372036854775808
(0,5) -9223372036854775808
(0,6) -9223372036854775808
(0,7) -9223372036854775808
(0,8) -9223372036854775808
(0,9) -9223372036854775808
(0,10) -9223372036854775808
(0,11) -9223372036854775808
(0,12) -9223372036854775808
(0,13) -9223372036854775808
(0,14) -9223372036854775808
(0,15) -9223372036854775808
(0,16) -9223372036854775808
(0,17) -9223372036854775808
(0,18) -9223372036854775808
(0,19) -9223372036854775808
(0,20) -9223372036854775808
(0,21) -9223372036854775808
(0,22) -9223372036854775808
(0,23) -9223372036854775808
(0,24) -9223372036854775808
: :
解决方法
感谢@hpaulj 的提示!问题是我的 dtype 是一个 int。因此,将其重铸为 float 可以解决问题。示例:
df.iloc[0:5,0:4].astype(float).T
0 1 2 3 4
1028799.3_nuc_coding 1.0 NaN NaN NaN NaN
1156994.3_nuc_coding NaN 1.0 NaN NaN NaN
1156995.3_nuc_coding NaN NaN NaN NaN NaN
1156996.3_nuc_coding NaN NaN NaN NaN NaN
类似地,一旦类型更改为浮点数,其他操作(如 iterrows 和强制转换为 coo_matrix)也能按预期工作。
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。