如何解决我收到错误消息“该条件需要布尔数组,而不是 int64”有人能帮我解决这个问题吗?
我有一个 .csv 格式的数据集。它有 136 列和 15036 行。我想计算每列数据集的熵和信息增益。这是我的代码:
def calc_entropy(column):
counts = np.bincount(column)
prob = counts/(len(column))
entropy = 0
for prob in prob:
if prob > 0:
entropy += prob * math.log(prob,2)
return -entropy
def information_gain(data,split,target):
ori_entropy = calc_entropy(data[target])
values = data[split].unique()
left_split = data[data[split]==values[0]]
right_split = data[data[split]==values[1]]
subract = 0
for subset in [left_split,right_split]:
prob = (subset.shape[0])/data.shape[0]
subract += prob * calc_entropy(subset[target])
return ori_entropy - subract
print(calc_entropy(dre[dre.iloc[:,0:136]]))
print(information_gain(dre,dre.iloc[:,0:136],"type"))
但是,我收到错误消息:
File "D:\informatics\FinalProject\Features_Ranking.py",line 59,in <module> print(calc_entropy(dre[dre.iloc[:,0:136]]))
File "C:\Users\ana\anaconda3\lib\site-packages\pandas\core\frame.py",line 2889,in __getitem__ return self.where(key)
File "C:\Users\ana\anaconda3\lib\site-packages\pandas\core\generic.py",line 9004,in where return self.where(
File "C:\Users\ana\anaconda3\lib\site-packages\pandas\core\generic.py",line 8766,in _where raise ValueError(msg.format(dtype=dt))
ValueError: Boolean array expected for the condition,not int64
谢谢
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。