我正在尝试向现有的R dataframe
添加一个新列,该列将根据相应行值中的值添加新列。如果值为1
,则新列值应包含one
,如果值为2
,则新列值应包含two
,否则为three or more
此代码:
mydf <- data.frame(a = 1:6,
b = rep("reproducible", 6),
c = rep("example", 6),
stringsAsFactors = FALSE)
mydf
呈现:
使用代码:
mydf["encoded"] <- { if (mydf['a'] == 1) 'one' else if (mydf['a'] == 2) 'two' else 'three or more' }
mydf
呈现:
还会生成警告:
Warning message in if (mydf["a"] == 1) "one" else if (mydf["a"] == 2) "two" else "three or more":
“the condition has length > 1 and only the first element will be used”
新列已添加到dataframe
,但所有值都相同:one
我没有实现正确添加新列值的逻辑?
答案 0 :(得分:3)
执行此操作的一个被忽视的功能是cut
功能:
mydf$encoded <- cut(mydf$a, c(0:2,Inf), c('one','two','three or more'))
结果:
> mydf
a b c encoded
1 1 reproducible example one
2 2 reproducible example two
3 3 reproducible example three or more
4 4 reproducible example three or more
5 5 reproducible example three or more
6 6 reproducible example three or more
答案 1 :(得分:2)
使用dplyr::case_when
的解决方案:
语法和逻辑不言自明:当a
等于1
时 - encoded
等于&#34;一个&#34 ;;当a
等于2
时 - encoded
等于&#34;两个&#34 ;;所有其他情况 - 编码等于&#34;三个或更多&#34;
mutate
只会创建一个新列。
library(dplyr)
mutate(mydf, encoded = case_when(a == 1 ~ "one",
a == 2 ~ "two",
TRUE ~ "three or more"))
a b c encoded
1 1 reproducible example one
2 2 reproducible example two
3 3 reproducible example three or more
4 4 reproducible example three or more
5 5 reproducible example three or more
6 6 reproducible example three or more
使用base::ifelse
的解决方案:
mydf$encoded <- ifelse(mydf$a == 1,
"one",
ifelse(mydf$a == 2,
"two",
"three or more"))
如果您不想多次撰写mydf$a
,可以使用with
:
mydf$encoded <- with(mydf, ifelse(a == 1,
"one",
ifelse(a == 2,
"two",
"three or more")))
答案 2 :(得分:1)
sapply
也可以完成这项工作:
mydf$encoded <- sapply(
mydf$a, function(a)
if (a == 1) 'one' else if (a == 2) 'two' else 'three or more')
mydf
# a b c encoded
# 1 1 reproducible example one
# 2 2 reproducible example two
# 3 3 reproducible example three or more
# 4 4 reproducible example three or more
# 5 5 reproducible example three or more
# 6 6 reproducible example three or more