如何唯一地识别变量组中的观察结果?

时间:2013-02-01 22:45:34

标签: r grouping

我有一个示例数据框“z”,如下所示:

deaths  sex race    smokes  pyears
10  Female  White   0   1410
14  Male    White   1   1974
14  Female  Black   0   1974
16  Male    Black   1   2256
17  Male    Black   0   2397
18  Female  NA  1   2538
19  NA  Black   0   2679
20  Female  White   1   2820
20  Female  Black   0   2820
21  Male    Black   1   2961

我喜欢创建新变量“group”,它们将变量种族和性别结合起来。这个新变量唯一地标识daaframe“z”中的观察组。预期的输出是

 group
    1
    2
    3
    4
    4
    6
    5
    1
    3
    4

我想知道我们如何在R中编码?

2 个答案:

答案 0 :(得分:2)

这就是我在想的事情:

dat <- read.table(text = "deaths  sex race    smokes  pyears
10  Female  White   0   1410
14  Male    White   1   1974
14  Female  Black   0   1974
16  Male    Black   1   2256
17  Male    Black   0   2397
18  Female  NA  1   2538
19  NA  Black   0   2679
20  Female  White   1   2820
20  Female  Black   0   2820
21  Male    Black   1   2961",header = TRUE,sep = "")

dat$sex <- factor(dat$sex,exclude = NULL)
dat$race <- factor(dat$race,exclude = NULL)

with(dat,interaction(sex,race))

 [1] Female.White Male.White   Female.Black Male.Black   Male.Black   Female.NA    NA.Black     Female.White Female.Black
[10] Male.Black  
Levels: Female.Black Male.Black NA.Black Female.White Male.White NA.White Female.NA Male.NA NA.NA

看起来你想要包含NA,而不是删除它们,因此显式的factor调用。显然,使用as.integer可以将结果因子转换为整数,但实际数字可能不会按照您指定的顺序,因为R将按字母顺序排序,而不是它们在数据框中的显示方式。< / p>

答案 1 :(得分:1)

您可以使用:

dat <- read.table(text="deaths  sex race    smokes  pyears
10  Female  White   0   1410
14  Male    White   1   1974
14  Female  Black   0   1974
16  Male    Black   1   2256
17  Male    Black   0   2397
18  Female  NA  1   2538
19  NA  Black   0   2679
20  Female  White   1   2820
20  Female  Black   0   2820
21  Male    Black   1   2961", header=TRUE)

library(qdap)
factor(paste2(dat[, 2:3], ,FALSE))

#for numeric:
as.numeric(factor(paste2(dat[, 2:3], ,FALSE)))

但是,正如Joran所指出的那样,你的数字期望与R将如何制造它们的方式不同。您必须使用levels内的factor按照您的意愿订购关卡。