Question

我有一个Excel文件，该文件在每个列中存储一个序列（从顶部单元格到底部单元格读取），该序列的趋势与上一列相似。所以我想预测此数据集中第n列的顺序。

我的数据集样本：

请注意，每列都有一组值/序列，随着我们向右移动，它们在某种程度上有所进展，因此我想预测例如Z列中的值。

到目前为止，这是我的代码：

import numpy as np
import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split

# Read the Excel file in rows
df = pd.read_excel(open('vec_sol2.xlsx', 'rb'),
                header=None, sheet_name='Sheet1')
print(type(df))
length = len(df.columns)
# Get the sequence for each row

x_train, x_test, y_train, y_test = train_test_split(
    np.reshape(range(0, length - 1), (-1, 1)), df, test_size=0.25, random_state=0)

print("y_train shape: ", y_train.shape)

pred_model = LogisticRegression()
pred_model.fit(x_train, y_train)
print(pred_model)

我将尽可能解释逻辑：

x_train和x_test只是与序列关联的索引/列号。
y_train是一个序列数组。
共有51列，因此将其拆分为25％的测试数据可得到37个训练序列和13个测试序列。

在调试时，我设法获得了每个var的形状，它们是：

x_train ：( 37，1）
x_test ：( 13，1）
y_train ：( 37，51）
y_test ：( 13，51）

但是现在，运行程序会给我这个错误：

ValueError: bad input shape (37, 51)

这是我的错？

Answer 1

我不明白你为什么要使用这个：

template <typename ... Ts, typename ... Us>
constexpr bool sae_helper (std::tuple<Ts...> const &,
                           std::tuple<Us...> const &)
 {
   using unused = bool[];

   bool ret { false };

   (void)unused { true, ret |= std::is_same<Ts, Us>::value... };

   return ret;
 }

struct no_type
 { };

template <typename ... Ts>
struct some_adjacent_equal
   : public std::integral_constant<bool, sae_helper(std::tuple<no_type, Ts...>{},
                                                    std::tuple<Ts..., no_type>{})>
 { };

您在x_train, x_test, y_train, y_test = train_test_split( np.reshape(range(0, length - 1), (-1, 1)), df, test_size=0.25, random_state=0)中有数据。从中提取df和X，然后将其拆分以进行训练和测试。

尝试一下：

否则，您共享的统计信息表明您正在尝试从一项功能中获得51列输出，如果您考虑一下，这很奇怪。

Python SKLearn：预测序列时出现“输入形状错误”错误

1 个答案: