scikit LogisticRegression回馈意外结果

时间:2015-11-22 16:18:59

标签: python pandas scikit-learn

我的数据位于dtype float64的ndarray中。

我的变量xy如下所示:

>>print x

[[  2.00000000e+00   1.12400000e+03]
 [  1.00000000e+00   6.09000000e+02]
 [  2.00000000e+00   1.48800000e+03]
 [  1.00000000e+00   7.00000000e+02]
 [  2.00000000e+00   1.24900000e+03]
 [  1.00000000e+00   8.05000000e+02]
 [  2.00000000e+00   1.36000000e+03]
 [  2.00000000e+00   1.12100000e+03]
 [  1.00000000e+00   8.05000000e+02]
 [  1.00000000e+00   6.09000000e+02]
 [  1.00000000e+00   6.09000000e+02]
 [  2.00000000e+00   1.50800000e+03]
 [  4.00000000e+00   3.41400000e+03]
 [  2.00000000e+00   1.15600000e+03]
 [  2.00000000e+00   1.15700000e+03]
 [  1.00000000e+00   8.55000000e+02]
 [  1.00000000e+00   7.30000000e+02]
 [  2.00000000e+00   1.15600000e+03]
 [  2.00000000e+00   1.21500000e+03]
 [  2.00000000e+00   1.38500000e+03]
 [  3.00000000e+00   1.29300000e+03]
 [  2.00000000e+00   1.15600000e+03]
 [  2.00000000e+00   1.48800000e+03]
 [  2.00000000e+00   1.20000000e+03]
 [  3.00000000e+00   1.22500000e+03]
 [  1.00000000e+00   8.15000000e+02]
 [  3.00000000e+00   1.24700000e+03]
 [  2.00000000e+00   1.15600000e+03]
 [  1.00000000e+00   8.27000000e+02]
 [  1.00000000e+00   7.00000000e+02]
 [  2.00000000e+00   1.20000000e+03]
 [  1.00000000e+00   6.09000000e+02]
 [  1.00000000e+00   7.64000000e+02]
 [  1.00000000e+00   6.09000000e+02]
 [  1.00000000e+00   8.30000000e+02]
 [  3.00000000e+00   1.22500000e+03]
 [  1.00000000e+00   6.09000000e+02]
 [  1.00000000e+00   6.09000000e+02]
 [  1.00000000e+00   8.16000000e+02]
 [  2.00000000e+00   1.15600000e+03]
 [  2.00000000e+00   1.03000000e+03]
 [  3.00000000e+00   1.24700000e+03]
 [  2.00000000e+00   1.06200000e+03]
 [  1.00000000e+00   6.57000000e+02]
 [  1.00000000e+00   7.73000000e+02]
 [  1.00000000e+00   6.09000000e+02]
 [  2.00000000e+00   1.31300000e+03]
 [  2.00000000e+00   8.00000000e+02]
 [  1.00000000e+00   7.50000000e+02]
 [  2.00000000e+00   1.21700000e+03]
 [  1.00000000e+00   6.09000000e+02]
 [  4.00000000e+00   2.76300000e+03]
 [  2.00000000e+00   1.15700000e+03]
 [  2.00000000e+00   1.12100000e+03]
 [  2.00000000e+00   1.20000000e+03]
 [  3.00000000e+00   1.48100000e+03]
 [  2.00000000e+00   1.15600000e+03]
 [  2.00000000e+00   8.00000000e+02]
 [  3.00000000e+00   1.61600000e+03]
 [  2.00000000e+00   1.38500000e+03]
 [  2.00000000e+00   1.50000000e+03]
 [  2.00000000e+00   1.38500000e+03]
 [  2.00000000e+00   1.14800000e+03]
 [  1.00000000e+00   8.59000000e+02]
 [  2.00000000e+00   1.38500000e+03]
 [  3.00000000e+00   1.55800000e+03]
 [  2.00000000e+00   1.47000000e+03]
 [  1.00000000e+00   7.77000000e+02]
 [  1.00000000e+00   6.09000000e+02]
 [  2.00000000e+00   1.21000000e+03]
 [  3.00000000e+00   1.30100000e+03]
 [  1.00000000e+00   6.09000000e+02]
 [  3.00000000e+00   1.22500000e+03]
 [  2.00000000e+00   1.15600000e+03]
 [  1.00000000e+00   8.05000000e+02]
 [  1.00000000e+00   7.34000000e+02]
 [  2.00000000e+00   9.65000000e+02]
 [  1.00000000e+00   8.30000000e+02]
 [  3.00000000e+00   1.22500000e+03]
 [  1.00000000e+00   6.09000000e+02]
 [  3.00000000e+00   1.42100000e+03]
 [  1.00000000e+00   7.50000000e+02]
 [  3.00000000e+00   1.78900000e+03]
 [  2.00000000e+00   1.12100000e+03]
 [  1.00000000e+00   6.09000000e+02]
 [  1.00000000e+00   8.05000000e+02]
 [  3.00000000e+00   1.20000000e+03]
 [  4.00000000e+00   2.76400000e+03]
 [  2.00000000e+00   1.01500000e+03]
 [  2.00000000e+00   1.84400000e+03]
 [  1.00000000e+00   6.09000000e+02]
 [  2.00000000e+00   1.09100000e+03]
 [  1.00000000e+00   8.70000000e+02]
 [  1.00000000e+00   8.30000000e+02]
 [  2.00000000e+00   1.12100000e+03]
 [  2.00000000e+00   1.21400000e+03]
 [  2.00000000e+00   9.26000000e+02]
 [  2.00000000e+00   1.09700000e+03]
 [  1.00000000e+00   6.25000000e+02]
 [  1.00000000e+00   6.25000000e+02]
 [  1.00000000e+00   7.50000000e+02]
 [  2.00000000e+00   1.15600000e+03]
 [  2.00000000e+00   1.48800000e+03]
 [  1.00000000e+00   6.09000000e+02]
 [  2.00000000e+00   1.06000000e+03]
 [  5.00000000e+00   3.66200000e+03]
 [  2.00000000e+00   1.03000000e+03]
 [  2.00000000e+00   1.17000000e+03]
 [  1.00000000e+00   7.64000000e+02]
 [  3.00000000e+00   1.34000000e+03]
 [  1.00000000e+00   6.09000000e+02]
 [  1.00000000e+00   6.09000000e+02]
 [  1.00000000e+00   6.09000000e+02]
 [  4.00000000e+00   3.54900000e+03]
 [  3.00000000e+00   1.00000000e+03]
 [  1.00000000e+00   6.09000000e+02]
 [  1.00000000e+00   6.09000000e+02]
 [  2.00000000e+00   8.00000000e+02]
 [  1.00000000e+00   6.09000000e+02]
 [  2.00000000e+00   1.09100000e+03]
 [  1.00000000e+00   6.09000000e+02]
 [  3.00000000e+00   1.38500000e+03]
 [  1.00000000e+00   6.09000000e+02]
 [  1.00000000e+00   6.09000000e+02]
 [  5.00000000e+00   3.66200000e+03]
 [  4.00000000e+00   1.76900000e+03]
 [  2.00000000e+00   1.14400000e+03]
 [  2.00000000e+00   1.09100000e+03]
 [  2.00000000e+00   1.09100000e+03]
 [  2.00000000e+00   1.12100000e+03]
 [  3.00000000e+00   1.47000000e+03]
 [  3.00000000e+00   1.58000000e+03]
 [  1.00000000e+00   9.60000000e+02]
 [  2.00000000e+00   1.01500000e+03]
 [  3.00000000e+00   1.44500000e+03]
 [  2.00000000e+00   1.06400000e+03]
 [  2.00000000e+00   1.09100000e+03]
 [  1.00000000e+00   7.50000000e+02]
 [  2.00000000e+00   1.09100000e+03]
 [  3.00000000e+00   1.80000000e+03]
 [  2.00000000e+00   1.25400000e+03]
 [  2.00000000e+00   1.09100000e+03]
 [  1.00000000e+00   8.79000000e+02]
 [  2.00000000e+00   1.50800000e+03]
 [  1.00000000e+00   8.43000000e+02]
 [  4.00000000e+00   2.10800000e+03]
 [  2.00000000e+00   1.20900000e+03]
 [  2.00000000e+00   1.50000000e+03]
 [  1.00000000e+00   7.50000000e+02]
 [  2.00000000e+00   1.46100000e+03]
 [  2.00000000e+00   8.50000000e+02]
 [  3.00000000e+00   1.50000000e+03]
 [  2.00000000e+00   9.50000000e+02]
 [  3.00000000e+00   1.34000000e+03]
 [  1.00000000e+00   7.30000000e+02]
 [  2.00000000e+00   1.14100000e+03]
 [  3.00000000e+00   1.12400000e+03]
 [  2.00000000e+00   1.12100000e+03]
 [  3.00000000e+00   1.22500000e+03]
 [  2.00000000e+00   1.00000000e+03]
 [  2.00000000e+00   1.31300000e+03]]

>>print y.flatten()
[ 1775.  1106.  1930.  1267.  1350.  1250.  1500.  1690.  1300.  1110.
  1178.  2200.  4500.  1985.  2045.  1195.  1100.  1985.  2269.  1550.
  2168.  2055.  1930.  1668.  1728.  1300.  1890.  1985.  1833.  1207.
  1741.  1090.  1050.  1188.  1308.  1745.  1200.  1230.  1680.  2070.
  1450.  1980.  1400.  1542.  1593.  1138.  2363.   850.  1050.  2137.
  1211.  2750.  2045.  1677.  1500.  2200.  2070.   775.  2100.  1500.
  1700.  1500.  1900.  1757.  1500.  2810.  1500.  1275.  1166.  1400.
  2569.  1256.  1633.  2070.  1290.  1150.  1435.  1344.  1628.  1166.
  2007.  1675.  2200.  1477.  1256.  1350.  1495.  2750.  1550.  2499.
  1186.  2098.  1372.  1384.  1567.  1650.  1375.  1350.  1075.  1200.
  1756.  1985.  1755.  1212.  1374.  3750.  1450.  1350.  1100.  1700.
  1166.  1212.  1202.  3950.  1250.  1054.  1241.  1100.  1256.  2098.
  1202.  1695.  1256.  1256.  3750.  2300.  1900.  2098.  2098.  1527.
  1450.  1700.  1381.  1600.  2000.  2021.  2098.  1663.  2098.  2000.
  2331.  2098.  1395.  2200.  1400.  2350.  2284.  1625.  1692.  1650.
  1339.  1800.  1428.  1700.  1100.  1518.  1700.  1492.  1590.  1300.
  2398.]

这个数据看起来与example类似,但是当我打印出我的结果时,回归似乎没有像我预期的那样运行(我预计会有1x2的beta版矩阵)。

我的结果如下:

('Coefficients: \n', array([[  7.03950002e+00,  -2.69281738e-02],
       [  7.03950002e+00,  -2.69281738e-02],
       [ -6.98978455e+00,   4.54840941e-03],
       [  1.44445066e+00,  -1.75824530e-02],
       [  8.11638781e-02,  -9.85091887e-03],
       [  1.44445066e+00,  -1.75824530e-02],
       [  1.65529232e-03,  -4.10775159e-03],
       [  1.44445066e+00,  -1.75824530e-02],
       [  1.44445066e+00,  -1.75824530e-02],
       [  1.44445066e+00,  -1.75824530e-02],
       [ -4.85824760e+00,   2.00707943e-03],
       [  1.16874660e+00,  -1.54523463e-02],
       [  1.44445066e+00,  -1.75824530e-02],
       [  1.44445066e+00,  -1.75824530e-02],
       [  1.44445066e+00,  -1.75824530e-02],
       [ -1.66777791e+01,   1.59406094e-02],
       [  7.50461656e-01,  -1.44174098e-02],
       [  1.27817712e+00,  -1.62305159e-02],
       [ -2.40249139e+00,  -2.19857700e-03],
       [  1.44445066e+00,  -1.75824530e-02],
       [  1.27817712e+00,  -1.62305159e-02],
       [  1.44445066e+00,  -1.75824530e-02],
       [  1.44445066e+00,  -1.75824530e-02],
       [  3.20263392e+00,  -9.50805754e-03],
       [  1.87802711e+00,  -2.23108770e-02],
       [ -2.40249139e+00,  -2.19857700e-03],
       [ -6.01730918e+00,   5.05808885e-03],
       [ -7.18885421e+00,   6.34980889e-03],
       [ -1.13882965e+00,  -9.90208457e-05],
       [ -8.39829744e+00,   8.30050908e-03],
       [  4.06246539e+00,  -1.24502573e-02],
       [ -8.39829763e+00,   8.30050917e-03],
       [ -4.58224954e-01,   4.95062738e-04],
       [ -1.74875944e+01,   1.68592654e-02],
       [  7.17565680e-01,  -1.52851151e-03],
       [  3.01633761e+00,  -9.16054410e-03],
       [ -5.76658998e+01,   5.87982295e-02],
       [ -8.39829954e+00,   8.30051003e-03],
       [ -2.30256592e+01,   2.09775103e-02],
       [ -6.52633498e-01,   5.82137199e-04],
       [  2.35503800e+00,  -6.55862039e-03],
       [  2.11785612e+00,  -5.89089182e-03],
       [  1.50150684e+00,  -2.08582766e-03],
       [  4.82563106e-01,  -6.81511087e-04],
       [  4.82562903e-01,  -6.81510981e-04],
       [  4.91817942e+00,  -8.22393628e-03],
       [ -1.19659445e+00,   1.99936701e-03],
       [  4.29564525e-01,  -4.87141011e-04],
       [  4.82563156e-01,  -6.81511112e-04],
       [ -4.06573886e-01,  -6.12386202e-03],
       [ -8.70466734e-02,   3.60870537e-04],
       [  4.82562847e-01,  -6.81510951e-04],
       [  4.66515975e+00,  -7.35139691e-03],
       [ -5.91285911e+00,   4.80207972e-03],
       [  1.34044916e+00,  -3.36860893e-03],
       [ -2.21927262e+00,   3.25733406e-03],
       [  4.66515975e+00,  -7.35139691e-03],
       [  4.66515975e+00,  -7.35139691e-03],
       [ -9.45979683e-01,   1.62357444e-03],
       [ -6.73126888e+00,   3.86607763e-03],
       [ -4.16308006e-02,   2.63002814e-04],
       [ -6.73126888e+00,   3.86607763e-03],
       [  4.82562487e-01,  -6.81510763e-04],
       [ -9.86380890e+00,   9.58052088e-03],
       [  4.82562171e-01,  -6.81510598e-04],
       [ -6.73126888e+00,   3.86607762e-03],
       [  2.97700421e+00,  -3.05777520e-03],
       [  2.24255263e+00,  -2.17012884e-03],
       [  4.66515975e+00,  -7.35139691e-03],
       [ -4.16308006e-02,   2.63002814e-04],
       [  4.66515975e+00,  -7.35139691e-03],
       [ -2.14831264e+00,   3.16340765e-03],
       [ -6.73126889e+00,   3.86607763e-03],
       [ -1.71124469e+01,   1.64491331e-02],
       [  4.76675011e-01,  -6.46787057e-04],
       [  2.28010471e+00,  -1.89360254e-03],
       [ -8.41385808e+00,   8.21857876e-03],
       [  4.33623144e+00,  -6.44620020e-03],
       [  3.34391606e-01,  -3.75409118e-04],
       [ -2.25369937e+00,   3.30993121e-03],
       [  4.33623145e+00,  -6.44620020e-03],
       [  1.84108480e-01,  -1.06071424e-04],
       [  1.89804414e+00,  -1.22769239e-03],
       [  2.80111814e+00,  -2.67770187e-03],
       [  6.64662880e-01,  -1.38196985e-03],
       [  2.81998334e-01,  -2.50519561e-04],
       [  4.43329199e-01,  -3.89208176e-04],
       [  2.49834048e-01,  -2.55971878e-04],
       [  7.34670884e-01,  -1.28326041e-03],
       [  1.86862116e+00,  -1.22488295e-03],
       [ -1.51028451e-01,   4.28126717e-04],
       [  3.66378641e+00,  -5.17492867e-03],
       [  2.21897534e-01,   5.88682056e-04],
       [ -1.27898988e-01,   4.05770885e-04],
       [ -9.34311436e-02,   3.49189918e-04],
       [  1.23480481e+01,  -1.06012344e-02],
       [ -3.04172929e-01,   7.13615607e-04],
       [  6.33853873e+00,  -4.36122413e-03],
       [ -7.18817629e-01,   1.31581160e-03],
       [ -7.18817629e-01,   1.31581160e-03],
       [ -6.76849419e+01,   7.80179333e-02],
       [  3.61205395e+00,  -4.73585605e-03],
       [  1.24893688e+00,   5.47184400e-04],
       [  2.13731015e+00,  -1.59511439e-03],
       [  7.80322748e+00,   6.41670434e-05],
       [ -1.25727569e+01,   1.80771247e-02],
       [ -4.06421193e+00,   7.16662649e-03]]))

我的模型调用按以下方式完成:

from sklearn import linear_model
logreg.fit(x, y.flatten())

我的任何形状都是:

print y.shape,x.shape

(161, 1) (161, 2)

我一定是犯了一个愚蠢的错误,任何意见都非常感激。

1 个答案:

答案 0 :(得分:1)

我认为您的问题不在于代码,而在于您用于数据的模型。

如果我只使用您的数据,我会得到相同的结果。你了解逻辑回归的作用吗?它根据连续输入数据预测一个类别。如果查看发布的示例,iris.target数据是0到2(包括0和2)的整数列表,每个整数对应一个不同的iris类型。例如0对应于setsoa类型。

您的y数据似乎不是标签数据。此外,期望1x2不符合分类设置。运行您链接到的示例,然后执行

logreg.coef_.shape
>>(3L, 2L)

你应该期望表格中每个类别的系数中有一行(n-cat,n-features),所以1x2期待一个类别,这没有意义--1类别只是“所有数据”。

您会获得大量行,因为您的每个y数据都被视为一个新类别。

我想也许你想要线性回归而不是逻辑回归。