如何检查两列是否具有相同的分箱项并量化差异

时间:2018-03-16 15:57:43

标签: r

我有一个学生预测成绩和实际成绩的数据集,我想创建一个新专栏,基本上告诉我他们的预测是否正确。 所以我的数据集看起来像这样:

    ID  Exam   Predicted  Actual(%)  Actual.Bins
1 S001    1      71-80%  66.66667      61-70%
2 S002    1 50% or less  60.00000      51-60%
3 S003    1      71-80%  60.00000      51-60%
4 S004    1      71-80%  93.33333     91-100%
5 S005    1      81-90%  86.66667      81-90%
6 S006    1      71-80%  66.66667      61-70%

我想补充一个"预测准确度"专栏,告诉我他们有多接近正确(就分级成绩而言)。它不一定必须是负数和正数...无论如何计算这将是伟大的。

 ID Exam   Predicted   Actual Actual.Bins  Pre.Accuracy
1 S001    1      71-80% 66.66667      61-70%    -1
2 S002    1 50% or less 60.00000      51-60%    +1
3 S003    1      71-80% 60.00000      51-60%    -2
4 S004    1      71-80% 93.33333     91-100%    +2
5 S005    1      81-90% 86.66667      81-90%     0
6 S006    1      71-80% 66.66667      61-70%    -1

提前谢谢!这是我的数据:

> dput(P1)
structure(list(ID = structure(c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 
9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L, 21L, 
22L, 23L, 24L, 25L, 26L, 27L, 28L, 29L, 30L, 31L, 32L, 33L, 34L, 
35L, 36L, 37L, 38L, 39L, 40L, 41L, 42L, 43L, 44L, 45L, 46L, 47L, 
48L, 49L, 50L, 51L, 52L, 53L, 54L, 55L, 56L, 57L, 58L, 59L, 60L, 
61L, 62L, 63L, 64L, 65L, 66L, 67L, 68L, 69L, 70L, 71L, 72L, 73L, 
74L, 75L, 76L, 77L, 78L, 79L, 80L, 81L, 82L, 83L, 84L, 85L, 86L, 
87L, 88L, 89L, 90L, 91L, 92L, 93L, 94L, 95L, 96L, 97L, 98L, 1L, 
2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 
16L, 17L, 18L, 19L, 20L, 21L, 22L, 23L, 24L, 25L, 26L, 27L, 28L, 
29L, 30L, 31L, 32L, 33L, 34L, 35L, 36L, 37L, 38L, 39L, 40L, 41L, 
42L, 43L, 44L, 45L, 46L, 47L, 48L, 49L, 50L, 51L, 52L, 53L, 54L, 
55L, 56L, 57L, 58L, 59L, 60L, 61L, 62L, 63L, 64L, 65L, 66L, 67L, 
68L, 69L, 70L, 71L, 72L, 73L, 74L, 75L, 76L, 77L, 78L, 79L, 80L, 
81L, 82L, 83L, 84L, 85L, 86L, 87L, 88L, 89L, 90L, 91L, 92L, 93L, 
94L, 95L, 96L, 97L, 98L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 
10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L, 21L, 22L, 
23L, 24L, 25L, 26L, 27L, 28L, 29L, 30L, 31L, 32L, 33L, 34L, 35L, 
36L, 37L, 38L, 39L, 40L, 41L, 42L, 43L, 44L, 45L, 46L, 47L, 48L, 
49L, 50L, 51L, 52L, 53L, 54L, 55L, 56L, 57L, 58L, 59L, 60L, 61L, 
62L, 63L, 64L, 65L, 66L, 67L, 68L, 69L, 70L, 71L, 72L, 73L, 74L, 
75L, 76L, 77L, 78L, 79L, 80L, 81L, 82L, 83L, 84L, 85L, 86L, 87L, 
88L, 89L, 90L, 91L, 92L, 93L, 94L, 95L, 96L, 97L, 98L), .Label = c("S001", 
"S002", "S003", "S004", "S005", "S006", "S007", "S008", "S009", 
"S010", "S011", "S012", "S013", "S014", "S015", "S016", "S017", 
"S018", "S019", "S020", "S021", "S022", "S023", "S024", "S025", 
"S026", "S027", "S028", "S029", "S030", "S031", "S032", "S033", 
"S034", "S035", "S036", "S037", "S038", "S039", "S040", "S041", 
"S042", "S043", "S044", "S045", "S046", "S047", "S048", "S049", 
"S050", "S051", "S052", "S053", "S054", "S055", "S056", "S057", 
"S058", "S059", "S060", "S061", "S062", "S063", "S064", "S065", 
"S066", "S067", "S068", "S069", "S070", "S071", "S072", "S073", 
"S074", "S075", "S076", "S077", "S078", "S079", "S080", "S081", 
"S082", "S083", "S084", "S085", "S086", "S087", "S088", "S089", 
"S090", "S091", "S092", "S093", "S094", "S095", "S096", "S097", 
"S098"), class = "factor"), Exam = c("1", "1", "1", "1", "1", 
"1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", 
"1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", 
"1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", 
"1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", 
"1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", 
"1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", 
"1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", 
"1", "1", "2", "2", "2", "2", "2", "2", "2", "2", "2", "2", "2", 
"2", "2", "2", "2", "2", "2", "2", "2", "2", "2", "2", "2", "2", 
"2", "2", "2", "2", "2", "2", "2", "2", "2", "2", "2", "2", "2", 
"2", "2", "2", "2", "2", "2", "2", "2", "2", "2", "2", "2", "2", 
"2", "2", "2", "2", "2", "2", "2", "2", "2", "2", "2", "2", "2", 
"2", "2", "2", "2", "2", "2", "2", "2", "2", "2", "2", "2", "2", 
"2", "2", "2", "2", "2", "2", "2", "2", "2", "2", "2", "2", "2", 
"2", "2", "2", "2", "2", "2", "2", "2", "2", "3", "3", "3", "3", 
"3", "3", "3", "3", "3", "3", "3", "3", "3", "3", "3", "3", "3", 
"3", "3", "3", "3", "3", "3", "3", "3", "3", "3", "3", "3", "3", 
"3", "3", "3", "3", "3", "3", "3", "3", "3", "3", "3", "3", "3", 
"3", "3", "3", "3", "3", "3", "3", "3", "3", "3", "3", "3", "3", 
"3", "3", "3", "3", "3", "3", "3", "3", "3", "3", "3", "3", "3", 
"3", "3", "3", "3", "3", "3", "3", "3", "3", "3", "3", "3", "3", 
"3", "3", "3", "3", "3", "3", "3", "3", "3", "3", "3", "3", "3", 
"3", "3", "3"), Predicted = c("71-80%", "50% or less", "71-80%", 
"71-80%", "81-90%", "71-80%", "61-70%", "61-70%", "61-70%", "71-80%", 
"81-90%", "50% or less", "51-60%", "61-70%", "61-70%", "51-60%", 
"50% or less", "61-70%", "71-80%", "50% or less", "61-70%", "61-70%", 
"81-90%", "71-80%", "71-80%", "91-100%", "81-90%", "61-70%", 
"71-80%", "91-100%", "61-70%", "71-80%", "71-80%", "71-80%", 
"71-80%", "51-60%", "51-60%", "81-90%", "71-80%", "61-70%", "50% or less", 
"71-80%", "61-70%", "61-70%", "50% or less", "81-90%", "61-70%", 
"71-80%", "81-90%", "81-90%", "71-80%", "61-70%", "61-70%", "71-80%", 
"71-80%", "71-80%", "51-60%", "61-70%", "81-90%", "71-80%", "91-100%", 
"71-80%", "81-90%", "71-80%", "51-60%", "61-70%", "91-100%", 
"61-70%", "61-70%", "91-100%", "71-80%", "61-70%", "71-80%", 
"71-80%", "61-70%", "61-70%", "71-80%", "71-80%", "61-70%", "61-70%", 
"71-80%", "71-80%", "81-90%", "81-90%", "71-80%", "51-60%", "61-70%", 
"71-80%", "71-80%", "71-80%", "51-60%", "71-80%", "61-70%", "50% or less", 
"50% or less", "71-80%", "51-60%", "71-80%", "71-80%", "61-70%", 
"61-70%", "81-90%", "71-80%", "61-70%", "71-80%", "71-80%", "71-80%", 
"51-60%", "71-80%", "50% or less", "51-60%", "51-60%", "71-80%", 
"61-70%", "51-60%", "61-70%", "51-60%", "51-60%", "51-60%", "61-70%", 
"81-90%", "71-80%", "71-80%", "81-90%", "81-90%", "61-70%", "61-70%", 
"81-90%", "51-60%", "71-80%", "51-60%", "50% or less", "50% or less", 
"61-70%", "51-60%", "71-80%", "71-80%", "51-60%", "61-70%", "51-60%", 
"61-70%", "71-80%", "51-60%", "51-60%", "71-80%", "61-70%", "71-80%", 
"81-90%", "51-60%", "71-80%", "51-60%", "61-70%", "61-70%", "71-80%", 
"71-80%", "61-70%", "71-80%", "61-70%", "81-90%", "61-70%", "81-90%", 
"71-80%", "61-70%", "61-70%", "71-80%", "51-60%", "51-60%", "81-90%", 
"71-80%", "50% or less", "71-80%", "71-80%", "71-80%", "61-70%", 
"71-80%", "71-80%", "61-70%", "51-60%", "71-80%", "71-80%", "81-90%", 
"81-90%", "61-70%", "51-60%", "61-70%", "71-80%", "71-80%", "51-60%", 
"51-60%", "51-60%", "61-70%", "50% or less", "61-70%", "71-80%", 
"50% or less", "71-80%", "61-70%", "61-70%", "61-70%", "81-90%", 
"81-90%", "61-70%", "51-60%", "81-90%", "71-80%", "51-60%", "81-90%", 
"50% or less", "61-70%", "51-60%", "81-90%", "51-60%", "50% or less", 
"61-70%", "61-70%", "61-70%", "50% or less", "50% or less", "91-100%", 
"61-70%", "71-80%", "71-80%", "81-90%", "71-80%", "71-80%", "81-90%", 
"51-60%", "61-70%", "50% or less", "61-70%", "61-70%", "51-60%", 
"61-70%", "71-80%", "61-70%", "50% or less", "50% or less", "50% or less", 
"61-70%", "71-80%", "61-70%", "61-70%", "71-80%", "51-60%", "71-80%", 
"81-90%", "61-70%", "81-90%", "61-70%", "61-70%", "51-60%", "71-80%", 
"61-70%", "50% or less", "71-80%", "61-70%", "81-90%", "51-60%", 
"91-100%", "81-90%", "61-70%", "51-60%", "81-90%", "50% or less", 
"71-80%", "91-100%", "71-80%", "61-70%", "71-80%", "71-80%", 
"61-70%", "51-60%", "71-80%", "71-80%", "51-60%", "51-60%", "71-80%", 
"51-60%", "71-80%", "81-90%", "51-60%", "51-60%", "61-70%", "61-70%", 
"71-80%", "61-70%", "50% or less", "51-60%", "61-70%", "51-60%", 
"51-60%", "51-60%", "51-60%", "51-60%"), Actual = c(66.66666667, 
60, 60, 93.33333333, 86.66666667, 66.66666667, 60, 93.33333333, 
46.66666667, 60, 100, 40, 46.66666667, 46.66666667, 66.66666667, 
53.33333333, 53.33333333, 66.66666667, 53.33333333, 86.66666667, 
60, 40, 93.33333333, 66.66666667, 80, 93.33333333, 93.33333333, 
80, 80, 73.33333333, 86.66666667, 66.66666667, 86.66666667, 66.66666667, 
46.66666667, 46.66666667, 73.33333333, 80, 66.66666667, 46.66666667, 
64, 53.33333333, 60, 86.66666667, 26.66666667, 66.66666667, 100, 
80, 80, 86.66666667, 46.66666667, 80, 53.33333333, 66.66666667, 
66.66666667, 53.33333333, 73.33333333, 66.66666667, 66.66666667, 
80, 93.33333333, 46.66666667, 93.33333333, 66.66666667, 73.33333333, 
60, 73.33333333, 33.33333333, 53.33333333, 66.66666667, 93.33333333, 
73.33333333, 86.66666667, 73.33333333, 80, 93.33333333, 60, 53.33333333, 
80, 53.33333333, 46.66666667, 86.66666667, 86.66666667, 66.66666667, 
53.33333333, 73.33333333, 73.33333333, 73.33333333, 20, 86.66666667, 
80, 73.33333333, 53.33333333, 33.33333333, 53.33333333, 53.33333333, 
0, 0, 76.92307692, 69.23076923, 30.76923077, 92.30769231, 76.92307692, 
61.53846154, 61.53846154, 84.61538462, 23.07692308, 53.84615385, 
76.92307692, 53.84615385, 30.76923077, 23.07692308, 76.92307692, 
38.46153846, 23.07692308, 69.23076923, 61.53846154, 92.30769231, 
46.15384615, 46.15384615, 92.30769231, 53.84615385, 100, 100, 
76.92307692, 46.15384615, 76.92307692, 76.92307692, 84.61538462, 
53.84615385, 61.53846154, 61.53846154, 76.92307692, 30.76923077, 
30.76923077, 53.84615385, 46.15384615, 23.07692308, 84.61538462, 
30.76923077, 69.23076923, 76.92307692, 30.76923077, 69.23076923, 
84.61538462, 69.23076923, 76.92307692, 84.61538462, 46.15384615, 
76.92307692, 61.53846154, 92.30769231, 38.46153846, 92.30769231, 
84.61538462, 23.07692308, 53.84615385, 46.15384615, 61.53846154, 
38.46153846, 100, 69.23076923, 61.53846154, 46.15384615, 53.84615385, 
46.15384615, 30.76923077, 69.23076923, 61.53846154, 38.46153846, 
84.61538462, 100, 69.23076923, 53.84615385, 46.15384615, 84.61538462, 
46.15384615, 38.46153846, 38.46153846, 53.84615385, 61.53846154, 
84.61538462, 92.30769231, 46.15384615, 69.23076923, 53.84615385, 
69.23076923, 53.84615385, 23.07692308, 53.84615385, 84.61538462, 
53.84615385, 76.92307692, 38.46153846, 46.15384615, 38.46153846, 
82.14285714, 64.28571429, 42.85714286, 85.71428571, 85.71428571, 
46.42857143, 57.14285714, 82.14285714, 39.28571429, 46.42857143, 
85.71428571, 57.14285714, 46.42857143, 57.14285714, 82.14285714, 
64.28571429, 50, 75, 60.71428571, 82.14285714, 46.42857143, 67.85714286, 
85.71428571, 53.57142857, 78.57142857, 82.14285714, 78.57142857, 
53.57142857, 67.85714286, 85.71428571, 67.85714286, 50, 82.14285714, 
57.14285714, 71.42857143, 53.57142857, 75, 67.85714286, 46.42857143, 
50, 46.42857143, 53.57142857, 64.28571429, 78.57142857, 46.42857143, 
64.28571429, 82.14285714, 75, 71.42857143, 75, 53.57142857, 71.42857143, 
60.71428571, 64.28571429, 64.28571429, 57.14285714, 71.42857143, 
39.28571429, 92.85714286, 46.42857143, 85.71428571, 64.28571429, 
85.71428571, 82.14285714, 78.57142857, 71.42857143, 82.14285714, 
57.14285714, 53.57142857, 75, 78.57142857, 71.42857143, 53.57142857, 
92.85714286, 75, 60.71428571, 50, 85.71428571, 39.28571429, 46.42857143, 
67.85714286, 39.28571429, 71.42857143, 85.71428571, 82.14285714, 
85.71428571, 71.42857143, 57.14285714, 71.42857143, 64.28571429, 
46.42857143, 53.57142857, 78.57142857, 50, 60.71428571, 50, 50, 
57.14285714), Actual.Bins = structure(c(3L, 2L, 2L, 6L, 5L, 3L, 
2L, 6L, 1L, 2L, 6L, 1L, 1L, 1L, 3L, 2L, 2L, 3L, 2L, 5L, 2L, 1L, 
6L, 3L, 4L, 6L, 6L, 4L, 4L, 4L, 5L, 3L, 5L, 3L, 1L, 1L, 4L, 4L, 
3L, 1L, 3L, 2L, 2L, 5L, 1L, 3L, 6L, 4L, 4L, 5L, 1L, 4L, 2L, 3L, 
3L, 2L, 4L, 3L, 3L, 4L, 6L, 1L, 6L, 3L, 4L, 2L, 4L, 1L, 2L, 3L, 
6L, 4L, 5L, 4L, 4L, 6L, 2L, 2L, 4L, 2L, 1L, 5L, 5L, 3L, 2L, 4L, 
4L, 4L, 1L, 5L, 4L, 4L, 2L, 1L, 2L, 2L, 1L, 1L, 4L, 3L, 1L, 6L, 
4L, 3L, 3L, 5L, 1L, 2L, 4L, 2L, 1L, 1L, 4L, 1L, 1L, 3L, 3L, 6L, 
1L, 1L, 6L, 2L, 6L, 6L, 4L, 1L, 4L, 4L, 5L, 2L, 3L, 3L, 4L, 1L, 
1L, 2L, 1L, 1L, 5L, 1L, 3L, 4L, 1L, 3L, 5L, 3L, 4L, 5L, 1L, 4L, 
3L, 6L, 1L, 6L, 5L, 1L, 2L, 1L, 3L, 1L, 6L, 3L, 3L, 1L, 2L, 1L, 
1L, 3L, 3L, 1L, 5L, 6L, 3L, 2L, 1L, 5L, 1L, 1L, 1L, 2L, 3L, 5L, 
6L, 1L, 3L, 2L, 3L, 2L, 1L, 2L, 5L, 2L, 4L, 1L, 1L, 1L, 5L, 3L, 
1L, 5L, 5L, 1L, 2L, 5L, 1L, 1L, 5L, 2L, 1L, 2L, 5L, 3L, 1L, 4L, 
3L, 5L, 1L, 3L, 5L, 2L, 4L, 5L, 4L, 2L, 3L, 5L, 3L, 1L, 5L, 2L, 
4L, 2L, 4L, 3L, 1L, 1L, 1L, 2L, 3L, 4L, 1L, 3L, 5L, 4L, 4L, 4L, 
2L, 4L, 3L, 3L, 3L, 2L, 4L, 1L, 6L, 1L, 5L, 3L, 5L, 5L, 4L, 4L, 
5L, 2L, 2L, 4L, 4L, 4L, 2L, 6L, 4L, 3L, 1L, 5L, 1L, 1L, 3L, 1L, 
4L, 5L, 5L, 5L, 4L, 2L, 4L, 3L, 1L, 2L, 4L, 1L, 3L, 1L, 1L, 2L
), .Label = c("50% or less", "51-60%", "61-70%", "71-80%", "81-90%", 
"91-100%"), class = "factor")), class = "data.frame", row.names = c(NA, 
-294L), .Names = c("ID", "Exam", "Predicted", "Actual", "Actual.Bins"
))

1 个答案:

答案 0 :(得分:1)

if (!window.overwolf) {
    window.overwolf = {
        foo() {
            console.info('overwolf now has foo function!');
        }
    };
}

然后要获得确切的格式,您可以执行以下操作:

> df$Pre.Accuracy <-  as.integer(df$Actual.Bins) - as.integer(as.factor(df$Predicted))
> head(df)
    ID Exam   Predicted   Actual Actual.Bins Pre.Accuracy
1 S001    1      71-80% 66.66667      61-70%           -1
2 S002    1 50% or less 60.00000      51-60%            1
3 S003    1      71-80% 60.00000      51-60%           -2
4 S004    1      71-80% 93.33333     91-100%            2
5 S005    1      81-90% 86.66667      81-90%            0
6 S006    1      71-80% 66.66667      61-70%           -1