如何解决ValueError:模型的特征数必须与输入匹配模型 n_features 为 30,输入 n_features 为 2
我是数据科学和机器学习的新手。因此,我尝试使用从 here 中引用的隔离森林算法来可视化异常值。我正在使用 kaggle 的信用卡欺诈数据集,X = 1-30 列,y = 列类
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size= 0.35)
# Define X dan y
columns = data1.columns.tolist()
# Filter columns
columns = [c for c in columns if c not in ["Class"]]
# Saving label Class in target
target = "Class"
# Identify state,value of X,X_outliers
state = np.random.RandomState(42)
X = data1[columns]
y = data1[target]
X_outliers = state.uniform(low=0,high=1,size=(X.shape[0],X.shape[1]))
rng = np.random.RandomState(42)
clf = IsolationForest(n_estimators=100,max_samples='auto',contamination=outlier_fraction,random_state=state,verbose=0)
clf.fit(X_train)
y_pred_train = clf.predict(X_train)
y_pred_test = clf.predict(X_test)
y_pred_outliers = clf.predict(X_outliers)
# plot the line,the samples,and the nearest vectors to the plane
xx,yy = np.meshgrid(np.linspace(-5,5,50),np.linspace(-5,50))
vrb = np.c_[xx.ravel(),yy.ravel()]
Z = clf.decision_function(vrb)
Z = Z.reshape(xx.shape)
plt.title("IsolationForest")
plt.contourf(xx,yy,Z,cmap=plt.cm.Blues_r)
b1 = plt.scatter(X_train[:,0],X_train[:,1],c='white',s=20,edgecolor='k')
b2 = plt.scatter(X_test[:,X_test[:,c='green',edgecolor='k')
c = plt.scatter(X_outliers[:,X_outliers[:,c='red',edgecolor='k')
plt.axis('tight')
plt.xlim((-5,5))
plt.ylim((-5,5))
plt.legend([b1,b2,c],["training observations","new regular observations","new abnormal observations"],loc="upper left")
plt.show()
我认为这是因为 vrb.shape (2500,2) 中的 y 与 X.shape (28481,30) 不同。但我不知道如何使它相同
我尝试将 (xx.shape) 更改为 X_train、X_test,但没有成功,我不断收到错误
ValueError: Number of features of the model must match the input. Model n_features is 30 and input n_features is 2.
这是我的完整code
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。