R:Logistic回归的混淆矩阵 - 混淆矩阵不符合预期?

时间:2017-09-08 15:08:58

标签: r logistic-regression confusion-matrix

这是我在运行混淆矩阵时得到的结果:

      TRUE
  0     47
  1 231194

想要这样的事情:

      0 1
  0  1000 47
  1  50  3000

我不确定我在这里做错了什么以及为什么我在运行混淆矩阵时会收到这种类型的响应。请帮忙。我尝试重新分类变量,看看是否有帮助。我也想知道我选择的变量是否不适合该问题并导致此问题。

Traffic_Reduced <- Traffic_Reduced[,c("Date.Of.Stop","Fatal","Time.Of.Stop","SubAgency","Belts","Commercial.License","HAZMAT",
                                      "Commercial.Vehicle","Alcohol","Work.Zone","State","VehicleType",                 
                                      "Make", "Color","Violation.Type", "Race","Gender","Driver.City",
                                      "Driver.State","DL.State")]

Traffic_Reduced$Time.Of.Stop <- as.numeric(Traffic_Reduced$Time.Of.Stop)

Traffic_Reduced$Time.Of.Stop = ifelse(is.na(Traffic_Reduced$Time.Of.Stop),
                                      ave(Traffic_Reduced$Time.Of.Stop, FUN = function(x) mean(x, na.rm = TRUE)),
                                      Traffic_Reduced$Time.Of.Stop)

# Classification - Logistic Regression
# Fatal (Y/N) and Time.of.Stop

classification <- Traffic_Reduced[,c("Date.Of.Stop","Time.Of.Stop","Fatal")]

classification$Time.Of.Stop <- as.numeric(classification$Time.Of.Stop)
classification$Date.Of.Stop <- as.numeric(classification$Date.Of.Stop)

classification$Fatal = factor(classification$Fatal,
                              labels = c(0,1)
                              )
set.seed(100)
split = sample.split(classification$Fatal, SplitRatio = 0.7)
training_set = subset(classification, split == TRUE)
test_set = subset(classification, split == FALSE)

training_set[-3] = scale(training_set[-3])
test_set[-3] = scale(test_set[-3])

classifier = glm(formula = Fatal ~ .,
                 family = binomial(link="logit"),
                 data = training_set)

predicted <- plogis(predict(classifier, test_set))

predicted <- predict(classifier, test_set, type = "response")

y_pred = ifelse(predicted > 0.5, 1, 0)

prob_pred = predict(classifier, type = 'response', newdata = test_set[-3])
y_pred = ifelse(prob_pred > 0.5, 1, 0)
cm = table(test_set[, 3], y_pred > 0.5)

Traffic_Reduced数据集示例:

  Date.Of.Stop Fatal Time.Of.Stop                                       SubAgency Belts Commercial.License
1   09/24/2013    No     17:11:00                     3rd district, Silver Spring    No                 No
2   08/29/2017    No     10:19:00                          2nd district, Bethesda    No                 No
3   12/01/2014    No     12:52:00 6th district, Gaithersburg / Montgomery Village    No                 No
4   08/29/2017    No     09:22:00                     3rd district, Silver Spring    No                 No
5   08/28/2017    No     23:41:00 6th district, Gaithersburg / Montgomery Village    No                 No
6   08/27/2013    No     00:55:00                          2nd district, Bethesda    No                 No
  HAZMAT Commercial.Vehicle Alcohol Work.Zone State     VehicleType        Make  Color Violation.Type
1     No                 No      No        No    MD 02 - Automobile        FORD  BLACK       Citation
2     No                 No      No        No    VA 02 - Automobile      TOYOTA  GREEN       Citation
3     No                 No      No        No    MD 02 - Automobile       HONDA SILVER       Citation
4     No                 No      No        No    MD 02 - Automobile        DODG  WHITE       Citation
5     No                 No      No        No    MD 02 - Automobile MINI COOPER  WHITE       Citation
6     No                 No      No        No    MD 02 - Automobile     HYUNDAI   GRAY       Citation
   Race Gender     Driver.City Driver.State DL.State
1 BLACK      M     TAKOMA PARK           MD       MD
2 WHITE      F FAIRFAX STATION           VA       VA
3 BLACK      F  UPPER MARLBORO           MD       MD
4 BLACK      M FORT WASHINGTON           MD       MD
5 WHITE      M    GAITHERSBURG           MD       MD
6 WHITE      F   SILVER SPRING           MD       MD

分类数据集示例 - 我将时间戳/日期转换为数值:

Date.Of.Stop Time.Of.Stop Fatal
1         1582         1032     0
2         1447          620     0
3         1923          773     0
4         1447          563     0
5         1441         1422     0
6         1431           56     0

0 个答案:

没有答案