区分R中因子变量的级别

时间:2014-11-26 16:27:01

标签: r dataset

假设我的数据集包含三列:id(标识),case(字符)和value(numeric)。这是我的数据集:

tdata <- data.frame(id=c(1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4), case=c("a","b","c","c","a","b","c","c","a","b","c","c","a","b","c","c"), value=c(1,34,56,23,546,34,67,23,65,23,65,23,87,34,321,56))

tdata
   id case value
1   1    a     1
2   1    b    34
3   1    c    56
4   1    c    23
5   2    a   546
6   2    b    34
7   2    c    67
8   2    c    23
9   3    a    65
10  3    b    23
11  3    c    65
12  3    c    23
13  4    a    87
14  4    b    34
15  4    c   321
16  4    c    56

如果您注意到,对于每个ID,我们有两个c。如何将它们重命名为c1和c2? (我需要区分它们 进一步分析)。

3 个答案:

答案 0 :(得分:8)

怎么样:

within(tdata, case <- ave(as.character(case), id, FUN=make.unique))

答案 1 :(得分:2)

我建议您只需添加辅助“ID”列,而不是替换“case”列中的值。这可以通过我的“splitstackshape”包中的getanID轻松完成。

library(splitstackshape)
getanID(tdata, c("id", "case"))[]
#     id case value .id
#  1:  1    a     1   1
#  2:  1    b    34   1
#  3:  1    c    56   1
#  4:  1    c    23   2
#  5:  2    a   546   1
#  6:  2    b    34   1
#  7:  2    c    67   1
#  8:  2    c    23   2
#  9:  3    a    65   1
# 10:  3    b    23   1
# 11:  3    c    65   1
# 12:  3    c    23   2
# 13:  4    a    87   1
# 14:  4    b    34   1
# 15:  4    c   321   1
# 16:  4    c    56   2

[]可能需要也可能不需要,具体取决于您安装的“data.table”版本。

如果确实确实想要折叠这些列,您也可以这样做:

getanID(tdata, c("id", "case"))[, case := paste0(case, .id)][, .id := NULL][]
#     id case value
#  1:  1   a1     1
#  2:  1   b1    34
#  3:  1   c1    56
#  4:  1   c2    23
#  5:  2   a1   546
#  6:  2   b1    34
#  7:  2   c1    67
#  8:  2   c2    23
#  9:  3   a1    65
# 10:  3   b1    23
# 11:  3   c1    65
# 12:  3   c2    23
# 13:  4   a1    87
# 14:  4   b1    34
# 15:  4   c1   321
# 16:  4   c2    56

答案 2 :(得分:2)

这种略微修改的方法如何:

library(dplyr)

tdata %>% group_by(id, case) %>% mutate(caseNo = paste0(case, row_number())) %>% 
    ungroup() %>% select(-case)

#Source: local data frame [16 x 3]
#
#   id value caseNo
#1   1     1     a1
#2   1    34     b1
#3   1    56     c1
#4   1    23     c2
#5   2   546     a1
#6   2    34     b1
#7   2    67     c1
#8   2    23     c2
#9   3    65     a1
#10  3    23     b1
#11  3    65     c1
#12  3    23     c2
#13  4    87     a1
#14  4    34     b1
#15  4   321     c1
#16  4    56     c2