为每个子类别创建分位数

时间:2017-05-25 05:37:49

标签: r quantile quartile

我有一个像以下

的数据集
Student|School|Marks
      a|DPS   |47
      b|DPS   |54
      c|DPS   |34
      d|DPS   |67
      e|DPS   |96
      f|DPS   |53
      g|DPS   |83
      h|DPS   |75
      i|DPS   |87
      j|DPS   |91
      k|KV    |46
      l|KV    |76
      m|KV    |82
      n|KV    |54
      o|KV    |72
      p|KV    |33
      q|KV    |40
      r|KV    |42
      s|KV    |54
      t|DAV   |78
      u|DAV   |98
      v|DAV   |89
      w|DAV   |91
      x|DAV   |21
      y|DAV   |67
      z|DAV   |98

我想为每所学校创造3个四分位数。你能建议一种方法吗?我希望结果看起来像这样

Student  |School  |Marks  |ntile
x    |DAV     |21     |1
y    |DAV     |67     |1
t    |DAV     |78     |2
v    |DAV     |89     |2
w    |DAV     |91     |3
u    |DAV     |98     |3
z    |DAV     |98     |3
c    |DPS     |34     |1
a    |DPS     |47     |1
f    |DPS     |53     |1
b    |DPS     |54     |1
d    |DPS     |67     |2
h    |DPS     |75     |2
g    |DPS     |83     |2
i    |DPS     |87     |3
j    |DPS     |91     |3
e    |DPS     |96     |3
p    |KV      |33     |1
q    |KV      |40     |1
r    |KV      |42     |1
k    |KV      |46     |2
n    |KV      |54     |2
s    |KV      |54     |2
o    |KV      |72     |3
l    |KV      |76     |3
m    |KV      |82     |3

这是一个新列,已添加以提及每个类别的quarantile

2 个答案:

答案 0 :(得分:0)

这是一个dplyr方法:

dat %>%
   group_by(School) %>%
   mutate(ntile=ntile(Marks,3))

# Source: local data frame [26 x 4]
# Groups: School [3]
# 
# Student School Marks ntile
# <fctr> <fctr> <int> <int>
# 1        a DPS       47     1
# 2        b DPS       54     1
# 3        c DPS       34     1
# 4        d DPS       67     2
# 5        e DPS       96     3
# 6        f DPS       53     1
# 7        g DPS       83     2
# 8        h DPS       75     2
# 9        i DPS       87     3
# 10       j DPS       91     3
# # ... with 16 more rows

输入数据:

dat <- structure(list(Student = structure(1:26, .Label = c("      a", 
                                                    "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", 
                                                    "o", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z"), class = "factor"), 
               School = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
                                    2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 
                                    1L, 1L), .Label = c("DAV   ", "DPS   ", "KV    "), class = "factor"), 
               Marks = c(47L, 54L, 34L, 67L, 96L, 53L, 83L, 75L, 87L, 91L, 
                         46L, 76L, 82L, 54L, 72L, 33L, 40L, 42L, 54L, 78L, 98L, 89L, 
                         91L, 21L, 67L, 98L)), .Names = c("Student", "School", "Marks"
                         ), class = "data.frame", row.names = c(NA, -26L))

答案 1 :(得分:0)

dplyr中的ntile函数适用于&#34; old-school&#34; ave

> dat$Q <- with(dat, ave(Marks, School, FUN=function(x) ntile(x, n=3) ) )
> dat
   Student School Marks Q
1        a DPS       47 1
2        b DPS       54 1
3        c DPS       34 1
4        d DPS       67 2
5        e DPS       96 3
6        f DPS       53 1
7        g DPS       83 2
snipped