我创建了一个包含以下数据的数据框
name <- c("A","B","C","D","E","F","G","H","I","J")
age <- c(22,43,12,17,29,5,51,56,9,44)
sex <- c("M","F","M","M","M","F","F","M","F","F")
rock <- data.frame(name,age,sex,stringsAsFactors = TRUE)
rock
现在我想找出:
如果名称为E至J且性别不等于F,则状态为“1F”,如果名称为A至D且年龄大于15,则状态为“年轻”。其他一切都是“他人”
所以,我正在申请以下代码:
rock$status <- ifelse(rock$name==c("E","F","G","H","I","J")&
rock$sex!="F","1F",
ifelse(rock$name==c("E","F","G","H","I","J")&rock$sex=="F","Fenamle",
ifelse(rock$name==c("A","B","C","D") & rock$age>15,"Young","Others")))
rock
但我得到的结果如下:
name age sex status
1 A 22 M Young
2 B 43 F Young
3 C 12 M Others
4 D 17 M Young
5 E 29 M Others
6 F 5 F Others
7 G 51 F Others
8 H 56 M Others
9 I 9 F Others
10 J 44 F Others
但是,它必须是E和H上的“1F”,但它显示“其他”
我在代码中做了什么错误?
请纠正我,并就此提出一些有价值的建议。
答案 0 :(得分:7)
我们需要使用%in%
代替==
:
rock$status <- ifelse(rock$name %in% c("E", "F", "G", "H", "I", "J") &
rock$sex != "F", "1F",
ifelse(rock$name %in% c("E", "F", "G", "H", "I", "J") &
rock$sex == "F", "Female",
ifelse(rock$name %in% c("A", "B", "C", "D") &
rock$age > 15, "Young", "Others")))
rock
# name age sex status
# 1 A 22 M Young
# 2 B 43 F Young
# 3 C 12 M Others
# 4 D 17 M Young
# 5 E 29 M 1F
# 6 F 5 F Female
# 7 G 51 F Female
# 8 H 56 M 1F
# 9 I 9 F Female
# 10 J 44 F Female
答案 1 :(得分:5)
在这种情况下,我经常更喜欢预先分配索引,然后使用这些索引的总和来索引唯一值。它比嵌套的ifelse
(imo)更快,更易读。一个例子:
i1 <- rock$name %in% c("E", "F", "G", "H", "I", "J") & rock$sex != "F"
i2 <- rock$name %in% c("E", "F", "G", "H", "I", "J") & rock$sex == "F"
i3 <- rock$name %in% c("A", "B", "C", "D") & rock$age > 15
rock$status <- c("Other", "1F", "Female", "Young")[1 + i1 + 2*i2 + 3*i3]
给出了期望的结果:
> rock name age sex status 1 A 22 M Young 2 B 43 F Young 3 C 12 M Other 4 D 17 M Young 5 E 29 M 1F 6 F 5 F Female 7 G 51 F Female 8 H 56 M 1F 9 I 9 F Female 10 J 44 F Female
答案 2 :(得分:2)
使用data.table,您可以:
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<input id="button1" type="button" value="Click me!" />
答案 3 :(得分:2)
使用dplyr
case_when()
函数的解决方案:
library(dplyr)
name <- c("A","B","C","D","E","F","G","H","I","J")
age <- c(22,43,12,17,29,5,51,56,9,44)
sex <- c("M","F","M","M","M","F","F","M","F","F")
rock <- data.frame(name,age,sex,stringsAsFactors = TRUE)
name_condition_1 <- c("E","F","G","H","I","J")
name_condition_2 <- c("A","B","C","D")
rock %>% mutate(
status = case_when(
name %in% name_condition_1 & sex != "F" ~ "1F",
name %in% name_condition_1 & sex == "F" ~ "Female",
name %in% name_condition_2 & age > 15 ~ "Young",
TRUE ~ "Others"
)
)
制造
name age sex status
1 A 22 M Young
2 B 43 F Young
3 C 12 M Others
4 D 17 M Young
5 E 29 M 1F
6 F 5 F Female
7 G 51 F Female
8 H 56 M 1F
9 I 9 F Female
10 J 44 F Female
答案 4 :(得分:2)
为了完整起见,这里还有一个解决方案,使用 join 和 non-equi join 来更新status
列:
library(data.table)
setDT(rock)[.(name = LETTERS[1:4], age = 15), on = .(name, age > age), status := "Young"][
.(name = LETTERS[5:10], sex = "F"), on = .(name, sex), status := "Female"][
.(name = LETTERS[5:10], status = NA_character_), on = .(name, status), status := "1F"][
.(status = NA_character_), on = .(status), status := "Other"][]
name age sex status 1: A 22 M Young 2: B 43 F Young 3: C 12 M Other 4: D 17 M Young 5: E 29 M 1F 6: F 5 F Female 7: G 51 F Female 8: H 56 M 1F 9: I 9 F Female 10: J 44 F Female
不幸的是,非equi连接不适用于不等运算符!=
。所以,
setDT(rock)[.(name = LETTERS[1:4], age = 15), on = .(name, age > age), status := "Young"][
.(name = LETTERS[5:10], sex = "F"), on = .(name, sex != sex), status := "1F"][]
给出错误消息。相反,我必须首先加入name
和sex
以将status
设置为Female
,然后检查NA
中的status
获得免费套装。
但是,还有另一种使用两种非等连接的解决方法:
setDT(rock)[.(name = LETTERS[1:4], age = 15), on = .(name, age > age), status := "Young"][
.(name = LETTERS[5:10], sex = "F"), on = .(name, sex < sex), status := "1F"][
.(name = LETTERS[5:10], sex = "F"), on = .(name, sex > sex), status := "1F"][]
答案 5 :(得分:-1)
data$status <- ifelse(data$name %in% c("A", "B", "C", "D") & data$age > 15,"Young",ifelse(data$sex != "F" & data$name %in% c("E", "F", "G", "H", "I", "J"),"1F","Others"))
data