使用不同功能时为什么会得到两个不同的输出?

时间:2019-12-14 10:58:46

标签: r

我有以下数据:

df1<- structure(list(Score = c(26,  46, 62, 57, 18, 16, 44, 37, 47, 32, 71, 72, 
39, 85, 39, 77, 82, 34, 73, 79, 82, 29, 30, 33, 61, 18, 15, 22, 30, 15, 17, 50,
34, 67, 46, 73, 10, 62, 20, 81, 55, 69, 52, 78, 61, 14, 59, 37, 60, 55, 31, 11,
13, 30, 68, 60, 61, 69, 20, 47, 81, 62, 76, 43, 42, 10, 36, 54, 56, 49, 15, 7,  
48, 11, 51, 32, 55, 80, 13, 57, 55, 70, 16, 85, 40, 75, 45, 7,  46, 19, 81, 35,
63, 30, 16, 71, 50, 15, 81, 55, 46, 27, 64, 29, 25, 79, 70, 13, 27, 14, 62, 53,
26, 53, 74, 48, 73, 68, 82, class = "data.frame")))

我已经使用以下函数来获得十分位数:

df1 %>%
    mutate(quantile = ntile(-Score, 10))

我已经使用StatMeasures包计算了十分位数。我用过:

df2<- decile(vector = Score, decreasing = TRUE)

但是使用这两个函数我得到了两个不同的解法。这很令人困惑。哪一个是正确的?我错过了什么吗?可以帮忙吗?

1 个答案:

答案 0 :(得分:1)

n_tile用于将值粗略放入10个bin / bucket中。它按等级排列,即前n / 10个排列进入1,下n / 10个排列进入2,依此类推。因此,当您在十分位值附近有联系时,它可能会进入不同的分档:

首先,我们取回您的计算结果:

library(StatMeasures)
library(dplyr)

df1 = data.frame(
Score = c(26, 46, 62, 57, 18, 16, 44, 37, 47, 32, 71, 72, 39, 85, 39, 77, 82, 34, 73, 79, 82, 29, 30, 33, 61, 18, 15, 22, 30, 15, 17, 50, 34, 67, 46, 73, 10, 62, 20, 81, 55, 69, 52, 78, 61, 14, 59, 37, 60, 55, 31, 11, 13, 30, 68, 60, 61, 69, 20, 47, 81, 62, 76, 43, 42, 10, 36, 54, 56, 49, 15, 7, 48, 11, 51, 32, 55, 80, 13, 57, 55, 70, 16, 85, 40, 75, 45, 7, 46, 19, 81, 35, 63, 30, 16, 71, 50, 15, 81, 55, 46, 27, 64, 29, 25, 79, 70, 13, 27, 14, 62, 53, 26, 53, 74, 48, 73, 68, 82)
)

df1 = df1 %>%
mutate(quantile1 = ntile(Score, 10)) %>%
mutate(quantile2 = decile(vector = Score))

我们查看您的十分位数:

quantile(df1$Score,seq(0,1,by=0.1))
  0%  10%  20%  30%  40%  50%  60%  70%  80%  90% 100% 
 7.0 15.0 21.2 30.4 39.2 48.0 55.0 61.6 70.0 78.2 85.0 

两个排名不同的地方:

df1[df1$quantile1 != df1$quantile2,]
    Score quantile1 quantile2
3      62         7         8
20     79         9        10
30     15         2         1
71     15         2         1
81     55         7         6
98     15         2         1
100    55         7         6
116    48         6         5

我们看一个例子:

df1[df1$Score==48,]
    Score quantile1 quantile2
73     48         5         5
116    48         6         5

如果要进行抽取,则n_tile的第一种方法不正确,因为48个进了2个纸槽。因此,请使用StatMeasures中的十分位函数。