发现样本数量不一致的输入变量：[50, 200] in Regression with python

如何解决发现样本数量不一致的输入变量：[50, 200] in Regression with python

我正在使用 python 中的回归模型。这是我的源代码。

我创建了一个变量并将其分配给想要预测的天数。创建一个新的目标列并移动“x”个单位。

future_days = 50
df['Prediction'] = df['PriceUSD'].shift(-future_days)
df.tail(4)

创建一个特征数据集并将其转换为一个 numpy 数组。

X = np.array(df.drop(['Prediction'],1))[:future_days]
print(X)

创建目标数据集 y 并在获取所有值后将其转换为 NumPy 数组。

y = np.array(df['Prediction'])[:-future_days]
print(y)

分割数据集。

x_train,x_test,y_train,y_test = train_test_split(X,y,test_size = 0.25)

拆分测试集时出错。

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-57-34da76684947> in <module>()
----> 1 x_train,test_size = 0.99)

2 frames
/usr/local/lib/python3.6/dist-packages/sklearn/utils/validation.py in check_consistent_length(*arrays)
    210     if len(uniques) > 1:
    211         raise ValueError("Found input variables with inconsistent numbers of"
--> 212                          " samples: %r" % [int(l) for l in lengths])
    213 
    214 

ValueError: Found input variables with inconsistent numbers of samples: [50,200]