如何解决如何使用名义数据类型预测多项式回归 Python
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score
from sklearn.preprocessing import PolynomialFeatures
df = pd.read_csv("diamonds.csv")
df = pd.get_dummies(df,columns = ["color","clarity","cut"])
X,Y = df.drop(labels = ["price","color_E","clarity_VS2","cut_Good"],axis = 1).values,df[["price"]].values
pf = PolynomialFeatures(degree = 2,include_bias = False)
pf.fit(X_train)
pf.transform(X_train)
pf.transform(X_train)
X_train_transformed = pf.transform(X_train)
X_test_transformed = pf.transform(X_test)
modelR = LinearRegression()
modelR.fit(X_train_transformed,Y_train)
predictionlist = [0.23,1,61.5,55,3.47,3.58,1.57]
print("Polynomial Regression score: " + str(modelR.score(X_test_transformed,Y_test)) + " prediction: " + str(modelR.predict(pf.fit_transform([predictionlist]))[0][0]))
这是输出:
多项式回归得分:0.96599715147751 预测:-16308769.231718607
我的多项式回归的分数很好,但我的预测很糟糕,钻石的价格怎么会是-16308769.231718607
我觉得我的预测列表很乱
解决方法
你搞砸了你的 pf.transform。在打印您的预测时 fit_transform,基本上您只在一个实例上适合您的转换,即您想要预测的那个。只需在您的训练集上进行 fit_transform,只需转换您的测试集并简单地转换您的预测列表。
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。