如何解决在虹膜数据中应用 scipy zscore
我正在尝试在 iris 数据集中应用 zscore
。由于 iris 数据集在最后一列中有 str
,因此我收到以下错误:
TypeError: unsupported operand type(s) for /: 'str' and 'int'
此处给出了 mwe:
import warnings
import matplotlib
import matplotlib.pyplot as plt
import numpy as np
import pandas
from pandas.plotting import scatter_matrix
from scipy.stats import zscore
from sklearn import model_selection
from sklearn.discriminant_analysis import LineardiscriminantAnalysis
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import (accuracy_score,classification_report,confusion_matrix)
from sklearn.naive_bayes import GaussianNB
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC
from sklearn.tree import DecisionTreeClassifier
matplotlib.use('TkAgg')
warnings.filterwarnings("ignore")
names = ['sepal-length','sepal-width','petal-length','petal-width','class']
df = pandas.read_csv("iris.data",names=names)
df.plot(kind='line',subplots=True,layout=(2,2),sharex=True,sharey=False)
plt.show()
df = df.sample(frac=1).reset_index(drop=True)
# Select upper triangle of correlation matrix
corr_matrix = df.corr().abs()
upper = corr_matrix.where(
np.triu(np.ones(corr_matrix.shape),k=1).astype(np.bool))
# Find index of feature columns with correlation greater than 0.9
to_drop = [column for column in upper.columns if any(upper[column] > 0.9)]
dataset = df.drop(df[to_drop],axis=1)
dataset = dataset.apply(zscore)
如何为这样的数据集计算 zscore?
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。