是否有一种方便的方法来生成包含子类型的新变量。 (用于分析)
E.g。我们有smoker
状态,sex
和lifeQuality
。
让我们说我们想在lifeQuality
是否有方便和通用的方式!获取我想要的子组(femaleSmoker
和maleSmoker
)?
set.seed(1337)
df <- data.frame(smoker=sample(c("yes","no"),10,replace = T),sex=sample(c("male","female"),10,replace = T),lifeQuality=rnorm(10))
df$femaleSmoker <- paste0(df$sex,"_",df$smoker)
df$femaleSmoker[df$sex=="male"] <- NA
df$maleSmoker <- paste0(df$sex,"_",df$smoker)
df$maleSmoker[df$sex=="female"] <- NA
> df
smoker sex lifeQuality femaleSmoker maleSmoker
1 no male 1.0467758 <NA> male_no
2 yes female 0.7706077 female_yes <NA>
3 yes male 0.3980541 <NA> male_yes
4 no female -0.3171052 female_no <NA>
5 no female -1.3180397 female_no <NA>
6 yes male 1.0174820 <NA> male_yes
7 no male -1.6237908 <NA> male_no
8 yes male -0.5703763 <NA> male_yes
9 yes male 0.3104756 <NA> male_yes
10 no male -2.6101319 <NA> male_no
>
答案 0 :(得分:2)
一般解决方案
fast.subgroups <- function(x,groups) {
groupsList <- strsplit(groups, "\\+")
for (i in length(groupsList):1) {
var <- groupsList[[i]]
lvl1 <- levels(factor(x[var[1]][,1]))
for(ii in length(lvl1):1) {
tmp <- paste(x[,var[1]],var[2],x[,var[2]],sep="_")
tmp[!(x[var[1]]==lvl1[ii])] <- NA
strCmd <- paste0("x <- cbind(",var[1],"_",lvl1[ii],"_",var[2],"=","tmp,x,stringsAsFactors = F)")
eval(parse(text = strCmd))
}
}
return(x)
}
数据:
set.seed(1337)
n =15
df <- data.frame(smoker=sample(c("yes","no"),n,replace = T),sex=sample(c("male","female"),n,replace = T),ill=sample(c("mild","moderate","severe"),n,replace = T),lifeQuality=rnorm(n),stringsAsFactors = F)
应用功能:
fast.subgroups(x=df,groups=c("sex+smoker","ill+sex"))
结果:
sex_female_smoker sex_male_smoker ill_mild_sex ill_moderate_sex ill_severe_sex smoker sex ill lifeQuality
1 <NA> male_smoker_no <NA> <NA> severe_sex_male no male severe -1.32964336
2 female_smoker_no <NA> mild_sex_female <NA> <NA> no female mild -0.18078626
3 female_smoker_yes <NA> <NA> <NA> severe_sex_female yes female severe -0.32265873
4 <NA> male_smoker_yes mild_sex_male <NA> <NA> yes male mild 0.55766293
5 <NA> male_smoker_yes <NA> <NA> severe_sex_male yes male severe -0.23733258
6 female_smoker_yes <NA> <NA> moderate_sex_female <NA> yes female moderate -0.58239712
7 female_smoker_no <NA> <NA> <NA> severe_sex_female no female severe 0.22477526
8 <NA> male_smoker_yes <NA> <NA> severe_sex_male yes male severe 0.42577251
9 <NA> male_smoker_yes mild_sex_male <NA> <NA> yes male mild -0.66224169
10 female_smoker_yes <NA> mild_sex_female <NA> <NA> yes female mild 1.49037322
11 female_smoker_no <NA> <NA> <NA> severe_sex_female no female severe -1.11923261
12 female_smoker_no <NA> <NA> <NA> severe_sex_female no female severe 0.06867219
13 female_smoker_no <NA> <NA> moderate_sex_female <NA> no female moderate 0.12729929
14 <NA> male_smoker_yes <NA> moderate_sex_male <NA> yes male moderate 0.83248241
15 female_smoker_no <NA> mild_sex_female <NA> <NA> no female mild -1.51970610
>
答案 1 :(得分:0)
您可以尝试case_when
中的dplyr
:
library(dplyr)
df <- data.frame(smoker=sample(c("yes","no"),10,replace = T),sex=sample(c("male","female"),10,replace = T),lifeQuality=rnorm(10))
df%>%
mutate(subcat=case_when(
.$smoker == "yes" & .$sex == "male" ~ "maleSmoker",
.$smoker == "no" & .$sex == "male" ~ "maleNonSmoker",
.$smoker == "yes" & .$sex == "female" ~ "femaleSmoker",
. $smoker == "no" & .$sex == "female" ~ "femaleNonSmoker"))
smoker sex lifeQuality subcat
1 no male 1.969426 maleNonSmoker
2 yes male 1.192345 maleSmoker
3 yes male -0.762863 maleSmoker
4 no male -1.259429 maleNonSmoker
5 yes female -2.423066 femaleSmoker
6 no male 0.249120 maleNonSmoker
7 no female -0.455351 femaleNonSmoker
8 yes female -1.623958 femaleSmoker
9 no male 0.680503 maleNonSmoker
10 yes male -1.374085 maleSmoker
如果您想在问题中使用female
和male
两列:
df%>%
mutate(femaleSmoker =case_when(
.$smoker == "yes" & .$sex == "female" ~ "female_yes",
. $smoker == "no" & .$sex == "female" ~ "female_no"),
maleSmoker =case_when(
.$smoker == "yes" & .$sex == "male" ~ "male_yes",
.$smoker == "no" & .$sex == "male" ~ "male_no"
))
smoker sex lifeQuality femaleSmoker maleSmoker
1 no male 1.969426 <NA> male_no
2 yes male 1.192345 <NA> male_yes
3 yes male -0.762863 <NA> male_yes
4 no male -1.259429 <NA> male_no
5 yes female -2.423066 female_yes <NA>
6 no male 0.249120 <NA> male_no
7 no female -0.455351 female_no <NA>
8 yes female -1.623958 female_yes <NA>
9 no male 0.680503 <NA> male_no
10 yes male -1.374085 <NA> male_yes