我有一个示例数据框“z
”,如下所示:
deaths sex race smokes pyears
10 Female White 0 1410
14 Male White 1 1974
14 Female Black 0 1974
16 Male Black 1 2256
17 Male Black 0 2397
18 Female NA 1 2538
19 NA Black 0 2679
20 Female White 1 2820
20 Female Black 0 2820
21 Male Black 1 2961
我喜欢创建新变量“group
”,它们将变量种族和性别结合起来。这个新变量唯一地标识daaframe“z”中的观察组。预期的输出是
group
1
2
3
4
4
6
5
1
3
4
我想知道我们如何在R中编码?
答案 0 :(得分:2)
这就是我在想的事情:
dat <- read.table(text = "deaths sex race smokes pyears
10 Female White 0 1410
14 Male White 1 1974
14 Female Black 0 1974
16 Male Black 1 2256
17 Male Black 0 2397
18 Female NA 1 2538
19 NA Black 0 2679
20 Female White 1 2820
20 Female Black 0 2820
21 Male Black 1 2961",header = TRUE,sep = "")
dat$sex <- factor(dat$sex,exclude = NULL)
dat$race <- factor(dat$race,exclude = NULL)
with(dat,interaction(sex,race))
[1] Female.White Male.White Female.Black Male.Black Male.Black Female.NA NA.Black Female.White Female.Black
[10] Male.Black
Levels: Female.Black Male.Black NA.Black Female.White Male.White NA.White Female.NA Male.NA NA.NA
看起来你想要包含NA,而不是删除它们,因此显式的factor
调用。显然,使用as.integer
可以将结果因子转换为整数,但实际数字可能不会按照您指定的顺序,因为R将按字母顺序排序,而不是它们在数据框中的显示方式。< / p>
答案 1 :(得分:1)
您可以使用:
dat <- read.table(text="deaths sex race smokes pyears
10 Female White 0 1410
14 Male White 1 1974
14 Female Black 0 1974
16 Male Black 1 2256
17 Male Black 0 2397
18 Female NA 1 2538
19 NA Black 0 2679
20 Female White 1 2820
20 Female Black 0 2820
21 Male Black 1 2961", header=TRUE)
library(qdap)
factor(paste2(dat[, 2:3], ,FALSE))
#for numeric:
as.numeric(factor(paste2(dat[, 2:3], ,FALSE)))
但是,正如Joran所指出的那样,你的数字期望与R将如何制造它们的方式不同。您必须使用levels
内的factor
按照您的意愿订购关卡。