你好，世界，

Question

你好，世界，

问题

当前，尝试从具有48098行和70个要素（所有类型）的数据框中预测边距（百分比）。此后，获取虚拟变量已完成，以仅具有数值，该数值具有以下形状（形状Df为：（48098，572））。

但是，从探索步骤开始，我们可以看到目标并没有真正遵循正态分布（如图所示）。

因此，训练和测试集的性能分别为0.76和0.74。

尝试过的事情

尝试了一些解决方案，例如：

拟合整个X集（在训练/测试之前），会产生以下错误：

第一个错误：

因此，已尝试实现多项式回归。当函数（PolynomialFeatures）适合训练集时，就会出现问题。实际上，出现以下错误：

  MemoryError                               Traceback (most recent call last)
  <ipython-input-19-bf8dbdba1272> in <module>
  3 poly = PolynomialFeatures(degree=2, include_bias=False)
  4 poly = poly.fit(X_train)
 ----> 5 X_poly = poly.transform(X_train)

  ~\AppData\Local\Continuum\anaconda3\Anaconda\lib\site-packages\sklearn\preprocessing\data.py in transform(self, X)
     1504             XP = sparse.hstack(columns, dtype=X.dtype).tocsc()
     1505         else:
 -> 1506             XP = np.empty((n_samples, self.n_output_features_ dtype=X.dtype)
    1507             for i, comb in enumerate(combinations):
    1508                 XP[:, i] = X[:, comb].prod(1)

MemoryError:

在获取虚拟变量之前先拟合多项式特征，以使列数少于500，从而产生以下ValueError：无法将字符串转换为float：“贷款”。

第二个错误：

 ---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-25-e379555bc257> in <module>
  5 # the default "include_bias=True" adds a feature that's constantly 1
  6 poly = PolynomialFeatures(degree=2, include_bias=False)
----> 7 poly = poly.fit(X)
  8 X_poly = poly.transform(X)


   1458         self : instance
   1459         """
-> 1460         n_samples, n_features = check_array(X, accept_sparse=True).shape
   1461         combinations = self._combinations(n_features, self.degree,
   1462                                           self.interaction_only,


    565         # make sure we actually converted to numeric:
    566         if dtype_numeric and array.dtype.kind == "O":
--> 567             array = array.astype(np.float64)
    568         if not allow_nd and array.ndim >= 3:
    569             raise ValueError("Found array with dim %d. %s expected <= 2."

 ValueError: could not convert string to float: 'Loans'

某些代码

首先：多项式

 from sklearn.preprocessing import PolynomialFeatures
 y  = df.MARGIN
 X = df.drop('MARGIN', axis=1)
 poly = PolynomialFeatures(degree=2, include_bias=False)
 poly = poly.fit(X)
 X_poly = poly.transform(X)

第二：训练/测试分组

    from sklearn.model_selection import train_test_split
    y = df_ohe.MARGIN
    X = df_ohe.drop('MARGIN', axis=1)
    # Split into Tain and Test set
    X_train,X_test,y_train,y_test = train_test_split (X, y, test_size=0.25, random_state=0)

期望的解决方案

预期结果将是完成多项式回归并具有类似以下内容：

首先：在X上实现多项式回归

 from sklearn.preprocessing import PolynomialFeatures
 poly = PolynomialFeatures(degree=10, include_bias=False)
 poly.fit(X)
 X_poly = poly.transform(X)

第二：在训练和测试集上拆分X

X_train和X_test是否具有多项式特征？

第三：预测

from sklearn.linear_model import LinearRegression
lr = LinearRegression().fit(X_train, y_train)
lr_pred = lr.predict(X_test)
train_R2_lr = lr.score(X_train, y_train)
test_R2_lr = lr.score(X_train, y_test)
print("Training set score: {:.2f}".format(train_R2_lr))
print("Test set score: {:.2f}".format(test_R2_lr))

问题：

什么时候实现get_dummies？（在拆分之前，之后）
当我们有那么多列时该怎么办？
这样做是“好的”方法吗？（该领域的新功能，欢迎您提供帮助）

如果您有任何建议，请随时与我们分享并感谢那些需要时间的人。

白天/夜晚都很好！

在Python中实现多项式回归（错误：无法转换）

你好，世界，

问题

尝试过的事情

第一个错误：

第二个错误：

某些代码

期望的解决方案

首先：在X上实现多项式回归

第二：在训练和测试集上拆分X

第三：预测

问题：

0 个答案: