将逻辑回归应用于简单数据集

时间:2017-11-23 04:05:11

标签: r machine-learning dataset logistic-regression

我试图将逻辑回归或任何其他ML算法应用于这个简单的数据集但是我失败了很多并且遇到了很多错误。我是tr

 dim(data)
 [1] 11580    12

 head(data)
 ReturnJan   ReturnFeb   ReturnMar   ReturnApr    ReturnMay  ReturnJune
  1  0.08067797  0.06625000  0.03294118  0.18309859  0.130333952 -0.01764234
  2 -0.01067989  0.10211539  0.14549595 -0.08442804 -0.327300392 -0.35926605
  3  0.04774193  0.03598972  0.03970223 -0.16235294 -0.147426982  0.04858934
  4 -0.07404022 -0.04816956  0.01821862 -0.02467917 -0.006036217 -0.02530364
  5 -0.03104575 -0.21267723  0.09147609  0.18933823 -0.153846154 -0.10611511
  6  0.57980016  0.33225225 -0.40546095 -0.06000000  0.060732113 -0.21536106

我试图预测的第12列看起来像这样

      PositiveDec
      0
      0
      0
      1
      1
      1

这是我的尝试

new.data <- data[,-12] #Remove labels' column

index <- sample(1:nrow(new.data), size = 0.8*nrow(new.data))#Split data

train.data <- new.data[index,]

test.data <- new.data[-index,]

fit.glm <- glm(data[,12]~.,data = data, family = "binomial")

1 个答案:

答案 0 :(得分:0)

你到了那里,但有几个语法错误,正如评论中指出的那样,需要保留你的结果变量。这应该有效:

# pd.Series([x.date() for x in pd.to_datetime(df['created_at'])])
Counter({datetime.date(2017, 10, 9): 8165,
        datetime.date(2017, 10, 10): 5898,
        datetime.date(2017, 10, 11): 3104,
        datetime.date(2017, 10, 12): 2067,
        datetime.date(2017, 10, 13): 1647,
        datetime.date(2017, 10, 14): 2750,
        datetime.date(2017, 10, 15): 2778,
        datetime.date(2017, 10, 16): 3575})