新的矢é‡ä¸Žä»£è¡¨å…¶ä»–矢é‡ç‹¬ç‰¹ç»„åˆçš„å­—æ¯

时间:2018-06-08 12:54:58

标签: r aggregate

我有

dat <-data.frame(study=letters[c(1,1,1,4,4,4,4,10,10)],n1i=c(25,25,22,38,50,30,30,50,50)) 

我想è¦

     study n1i grp
1     a  25   A
2     a  25   A
3     a  22   B
4     d  38   A
5     d  50   B
6     d  30   C
7     d  30   C
8     j  50   A
9     j  50   A

但是这......

dat$grp<-  
  as.vector(unlist(aggregate(dat$n1i,
   list(dat$study), function(x) LETTERS[1:length(x)])$x)) 

...给我

> dat
  study n1i grp
1     a  25   A
2     a  25   B
3     a  22   C
4     d  38   A
5     d  50   B
6     d  30   C
7     d  30   D
8     j  50   A
9     j  50   B

在å•è¯ä¸­æˆ‘希望“grpâ€å­—æ¯ä»Ž1到达到最åŽä¸€ä¸ªç‹¬ç‰¹çš„学习组åˆ* n1i。

5 个答案:

答案 0 :(得分:4)

dat <-data.frame(study=letters[c(1,1,1,4,4,4,4,10,10)],n1i=c(25,25,22,38,50,30,30,50,50)) 

library(dplyr)

dat %>%
  group_by(study) %>%                    # for each study
  mutate(id = row_number()) %>%          # get the number of row as an id
  group_by(study, n1i) %>%               # for each study and n1i combination
  transmute(grp = LETTERS[min(id)]) %>%  # add the letters based on the minimum id value of that combination, while removing the id column
  ungroup()                              # forget the grouping

# # A tibble: 9 x 3
#   study   n1i grp  
#   <fct> <dbl> <chr>
# 1 a        25 A    
# 2 a        25 A    
# 3 a        22 C    
# 4 d        38 A    
# 5 d        50 B    
# 6 d        30 C    
# 7 d        30 C    
# 8 j        50 A    
# 9 j        50 A 

此方法å‡å®šé‡å¤çš„行是一个接一个的。

答案 1 :(得分:4)

这是基于è¿è¡Œé•¿åº¦ç¼–ç ID,它å‡å®šå”¯ä¸€ç»„åˆä»…出现在å—中而ä¸æ˜¯åˆ†å¼€çš„行中。

library(dplyr)
library(data.table)

dat2 <- dat %>%
  group_by(study) %>%
  mutate(grp =rleid(n1i)) %>%
  mutate(grp = LETTERS[grp]) %>%
  ungroup()
dat2
# # A tibble: 9 x 3
#   study   n1i grp  
#   <fct> <dbl> <chr>
# 1 a        25 A    
# 2 a        25 A    
# 3 a        22 B    
# 4 d        38 A    
# 5 d        50 B    
# 6 d        30 C    
# 7 d        30 C    
# 8 j        50 A    
# 9 j        50 A 

此解决方案å¯èƒ½å¹¶ä¸å®Œç¾Žã€‚例如,如果数æ®æ¡†å¦‚下所示:

study   n1i
   a     25
   a     22
   a     25

行程编ç å°†å˜ä¸º1, 2, 3。在这ç§æƒ…况下,您å¯èƒ½å¸Œæœ›é¦–先对数æ®æ¡†è¿›è¡ŒæŽ’åºï¼Œä»¥åœ¨åŒä¸€å—中生æˆç›¸åŒçš„行。我没有在我的解决方案中下订å•æˆ–安排电è¯ï¼Œå› ä¸ºæˆ‘ä¸çŸ¥é“是å¦æŒ‰è¡Œé‡æ–°æŽ’åºã€‚

<强>更新

AntoniosK评论说å‰ä¸‰è¡Œåº”该是A, A, C。我已将min_rank添加到管é“æ“作中以说明此语å¥ã€‚

dat2 <- dat %>%
  group_by(study) %>%
  mutate(grp =rleid(n1i)) %>%
  mutate(grp = min_rank(grp)) %>%
  mutate(grp = LETTERS[grp]) %>%
  ungroup()
dat2
# # A tibble: 9 x 3
#   study   n1i grp  
#   <fct> <dbl> <chr>
# 1 a        25 A    
# 2 a        25 A    
# 3 a        22 C    
# 4 d        38 A    
# 5 d        50 B    
# 6 d        30 C    
# 7 d        30 C    
# 8 j        50 A    
# 9 j        50 A 

答案 2 :(得分:4)

这是一个没有é¢å¤–包装的å•çº¿ï¼Œ

LETTERS[with(dat, ave(n1i, study, FUN = function(i) 
                                cumsum(!duplicated(i) | duplicated(i, fromLast = TRUE))))]
#[1] "A" "A" "B" "A" "B" "C" "C" "A" "A"

答案 3 :(得分:3)

或å¦ä¸€ä¸ªé€‰é¡¹æ˜¯data.table

library(data.table)
setDT(dat)[, grp := LETTERS[rleid(n1i)], study]
dat
#   study n1i grp
#1:     a  25   A
#2:     a  25   A
#3:     a  22   B
#4:     d  38   A
#5:     d  50   B
#6:     d  30   C
#7:     d  30   C
#8:     j  50   A
#9:     j  50   A

修改

基于@AntoniosK的评论,正确的输出应为

setDT(dat)[, i1 := seq_len(.N), study][, grp := LETTERS[min(i1)], 
                .(study, n1i)][, i1 := NULL][]
#   study n1i grp
#1:     a  25   A
#2:     a  25   A
#3:     a  22   C
#4:     d  38   A
#5:     d  50   B
#6:     d  30   C
#7:     d  30   C
#8:     j  50   A
#9:     j  50   A

答案 4 :(得分:0)

使用tidyverse,使用dplyr::group_indices:

dat %>%
  split(.$study) %>%
  map_dfr(~mutate(.,id = LETTERS[
    group_indices(.,factor(n1i,unique(n1i)))]))

#   study n1i id
# 1     a  25  A
# 2     a  25  A
# 3     a  22  B
# 4     d  38  A
# 5     d  50  B
# 6     d  30  C
# 7     d  30  C
# 8     j  50  A
# 9     j  50  A