假设我的数据集包含三列:id(标识),case(字符)和value(numeric)。这是我的数据集:
tdata <- data.frame(id=c(1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4), case=c("a","b","c","c","a","b","c","c","a","b","c","c","a","b","c","c"), value=c(1,34,56,23,546,34,67,23,65,23,65,23,87,34,321,56))
tdata
id case value
1 1 a 1
2 1 b 34
3 1 c 56
4 1 c 23
5 2 a 546
6 2 b 34
7 2 c 67
8 2 c 23
9 3 a 65
10 3 b 23
11 3 c 65
12 3 c 23
13 4 a 87
14 4 b 34
15 4 c 321
16 4 c 56
如果您注意到,对于每个ID,我们有两个c。如何将它们重命名为c1和c2? (我需要区分它们 进一步分析)。
答案 0 :(得分:8)
怎么样:
within(tdata, case <- ave(as.character(case), id, FUN=make.unique))
答案 1 :(得分:2)
我建议您只需添加辅助“ID”列,而不是替换“case”列中的值。这可以通过我的“splitstackshape”包中的getanID
轻松完成。
library(splitstackshape)
getanID(tdata, c("id", "case"))[]
# id case value .id
# 1: 1 a 1 1
# 2: 1 b 34 1
# 3: 1 c 56 1
# 4: 1 c 23 2
# 5: 2 a 546 1
# 6: 2 b 34 1
# 7: 2 c 67 1
# 8: 2 c 23 2
# 9: 3 a 65 1
# 10: 3 b 23 1
# 11: 3 c 65 1
# 12: 3 c 23 2
# 13: 4 a 87 1
# 14: 4 b 34 1
# 15: 4 c 321 1
# 16: 4 c 56 2
[]
可能需要也可能不需要,具体取决于您安装的“data.table”版本。
如果确实确实想要折叠这些列,您也可以这样做:
getanID(tdata, c("id", "case"))[, case := paste0(case, .id)][, .id := NULL][]
# id case value
# 1: 1 a1 1
# 2: 1 b1 34
# 3: 1 c1 56
# 4: 1 c2 23
# 5: 2 a1 546
# 6: 2 b1 34
# 7: 2 c1 67
# 8: 2 c2 23
# 9: 3 a1 65
# 10: 3 b1 23
# 11: 3 c1 65
# 12: 3 c2 23
# 13: 4 a1 87
# 14: 4 b1 34
# 15: 4 c1 321
# 16: 4 c2 56
答案 2 :(得分:2)
这种略微修改的方法如何:
library(dplyr)
tdata %>% group_by(id, case) %>% mutate(caseNo = paste0(case, row_number())) %>%
ungroup() %>% select(-case)
#Source: local data frame [16 x 3]
#
# id value caseNo
#1 1 1 a1
#2 1 34 b1
#3 1 56 c1
#4 1 23 c2
#5 2 546 a1
#6 2 34 b1
#7 2 67 c1
#8 2 23 c2
#9 3 65 a1
#10 3 23 b1
#11 3 65 c1
#12 3 23 c2
#13 4 87 a1
#14 4 34 b1
#15 4 321 c1
#16 4 56 c2