我有
dat <-data.frame(study=letters[c(1,1,1,4,4,4,4,10,10)],n1i=c(25,25,22,38,50,30,30,50,50))
我想è¦
study n1i grp
1 a 25 A
2 a 25 A
3 a 22 B
4 d 38 A
5 d 50 B
6 d 30 C
7 d 30 C
8 j 50 A
9 j 50 A
但是这......
dat$grp<-
as.vector(unlist(aggregate(dat$n1i,
list(dat$study), function(x) LETTERS[1:length(x)])$x))
...给我
> dat
study n1i grp
1 a 25 A
2 a 25 B
3 a 22 C
4 d 38 A
5 d 50 B
6 d 30 C
7 d 30 D
8 j 50 A
9 j 50 B
在å•è¯ä¸æˆ‘希望“grpâ€å—æ¯ä»Ž1到达到最åŽä¸€ä¸ªç‹¬ç‰¹çš„å¦ä¹ 组åˆ* n1i。
ç”案 0 :(得分:4)
dat <-data.frame(study=letters[c(1,1,1,4,4,4,4,10,10)],n1i=c(25,25,22,38,50,30,30,50,50))
library(dplyr)
dat %>%
group_by(study) %>% # for each study
mutate(id = row_number()) %>% # get the number of row as an id
group_by(study, n1i) %>% # for each study and n1i combination
transmute(grp = LETTERS[min(id)]) %>% # add the letters based on the minimum id value of that combination, while removing the id column
ungroup() # forget the grouping
# # A tibble: 9 x 3
# study n1i grp
# <fct> <dbl> <chr>
# 1 a 25 A
# 2 a 25 A
# 3 a 22 C
# 4 d 38 A
# 5 d 50 B
# 6 d 30 C
# 7 d 30 C
# 8 j 50 A
# 9 j 50 A
æ¤æ–¹æ³•å‡å®šé‡å¤çš„行是一个接一个的。
ç”案 1 :(得分:4)
这是基于è¿è¡Œé•¿åº¦ç¼–ç ID,它å‡å®šå”¯ä¸€ç»„åˆä»…出现在å—ä¸è€Œä¸æ˜¯åˆ†å¼€çš„è¡Œä¸ã€‚
library(dplyr)
library(data.table)
dat2 <- dat %>%
group_by(study) %>%
mutate(grp =rleid(n1i)) %>%
mutate(grp = LETTERS[grp]) %>%
ungroup()
dat2
# # A tibble: 9 x 3
# study n1i grp
# <fct> <dbl> <chr>
# 1 a 25 A
# 2 a 25 A
# 3 a 22 B
# 4 d 38 A
# 5 d 50 B
# 6 d 30 C
# 7 d 30 C
# 8 j 50 A
# 9 j 50 A
æ¤è§£å†³æ–¹æ¡ˆå¯èƒ½å¹¶ä¸å®Œç¾Žã€‚例如,如果数æ®æ¡†å¦‚下所示:
study n1i
a 25
a 22
a 25
行程编ç å°†å˜ä¸º1, 2, 3
。在这ç§æƒ…况下,您å¯èƒ½å¸Œæœ›é¦–先对数æ®æ¡†è¿›è¡ŒæŽ’åºï¼Œä»¥åœ¨åŒä¸€å—ä¸ç”Ÿæˆç›¸åŒçš„行。我没有在我的解决方案ä¸ä¸‹è®¢å•æˆ–安排电è¯ï¼Œå› 为我ä¸çŸ¥é“是å¦æŒ‰è¡Œé‡æ–°æŽ’åºã€‚
<强>更新强>
AntoniosK评论说å‰ä¸‰è¡Œåº”该是A, A, C
。我已将min_rank
æ·»åŠ åˆ°ç®¡é“æ“作ä¸ä»¥è¯´æ˜Žæ¤è¯å¥ã€‚
dat2 <- dat %>%
group_by(study) %>%
mutate(grp =rleid(n1i)) %>%
mutate(grp = min_rank(grp)) %>%
mutate(grp = LETTERS[grp]) %>%
ungroup()
dat2
# # A tibble: 9 x 3
# study n1i grp
# <fct> <dbl> <chr>
# 1 a 25 A
# 2 a 25 A
# 3 a 22 C
# 4 d 38 A
# 5 d 50 B
# 6 d 30 C
# 7 d 30 C
# 8 j 50 A
# 9 j 50 A
ç”案 2 :(得分:4)
这是一个没有é¢å¤–包装的å•çº¿ï¼Œ
LETTERS[with(dat, ave(n1i, study, FUN = function(i)
cumsum(!duplicated(i) | duplicated(i, fromLast = TRUE))))]
#[1] "A" "A" "B" "A" "B" "C" "C" "A" "A"
ç”案 3 :(得分:3)
或å¦ä¸€ä¸ªé€‰é¡¹æ˜¯data.table
library(data.table)
setDT(dat)[, grp := LETTERS[rleid(n1i)], study]
dat
# study n1i grp
#1: a 25 A
#2: a 25 A
#3: a 22 B
#4: d 38 A
#5: d 50 B
#6: d 30 C
#7: d 30 C
#8: j 50 A
#9: j 50 A
基于@AntoniosK的评论,æ£ç¡®çš„输出应为
setDT(dat)[, i1 := seq_len(.N), study][, grp := LETTERS[min(i1)],
.(study, n1i)][, i1 := NULL][]
# study n1i grp
#1: a 25 A
#2: a 25 A
#3: a 22 C
#4: d 38 A
#5: d 50 B
#6: d 30 C
#7: d 30 C
#8: j 50 A
#9: j 50 A
ç”案 4 :(得分:0)
使用tidyverse
,使用dplyr::group_indices
:
dat %>%
split(.$study) %>%
map_dfr(~mutate(.,id = LETTERS[
group_indices(.,factor(n1i,unique(n1i)))]))
# study n1i id
# 1 a 25 A
# 2 a 25 A
# 3 a 22 B
# 4 d 38 A
# 5 d 50 B
# 6 d 30 C
# 7 d 30 C
# 8 j 50 A
# 9 j 50 A