将具有自定义值的列添加到dataframe

时间:2017-09-30 21:42:56

标签: r dataframe

我正在尝试向现有的R dataframe添加一个新列,该列将根据相应行值中的值添加新列。如果值为1,则新列值应包含one,如果值为2,则新列值应包含two,否则为three or more

此代码:

mydf <- data.frame(a = 1:6, 
                   b = rep("reproducible", 6),
                   c = rep("example", 6), 
                   stringsAsFactors = FALSE)
mydf

呈现:

enter image description here

使用代码:

    mydf["encoded"] <- { if (mydf['a'] == 1) 'one' else if (mydf['a'] == 2) 'two' else 'three or more' }
mydf

呈现:

enter image description here

还会生成警告:

Warning message in if (mydf["a"] == 1) "one" else if (mydf["a"] == 2) "two" else "three or more":
“the condition has length > 1 and only the first element will be used”

新列已添加到dataframe,但所有值都相同:one

我没有实现正确添加新列值的逻辑?

3 个答案:

答案 0 :(得分:3)

执行此操作的一个被忽视的功能是cut功能:

mydf$encoded <- cut(mydf$a, c(0:2,Inf), c('one','two','three or more'))

结果:

> mydf
  a            b       c       encoded
1 1 reproducible example           one
2 2 reproducible example           two
3 3 reproducible example three or more
4 4 reproducible example three or more
5 5 reproducible example three or more
6 6 reproducible example three or more

答案 1 :(得分:2)

使用dplyr::case_when的解决方案:

语法和逻辑不言自明:当a等于1时 - encoded等于&#34;一个&#34 ;;当a等于2时 - encoded等于&#34;两个&#34 ;;所有其他情况 - 编码等于&#34;三个或更多&#34; mutate只会创建一个新列。

library(dplyr)
mutate(mydf, encoded = case_when(a == 1 ~ "one",
                                 a == 2 ~ "two",
                                 TRUE ~ "three or more"))

  a            b       c       encoded
1 1 reproducible example           one
2 2 reproducible example           two
3 3 reproducible example three or more
4 4 reproducible example three or more
5 5 reproducible example three or more
6 6 reproducible example three or more

使用base::ifelse的解决方案:

mydf$encoded <- ifelse(mydf$a == 1, 
                       "one", 
                       ifelse(mydf$a == 2, 
                              "two",
                              "three or more"))

如果您不想多次撰写mydf$a,可以使用with

mydf$encoded <- with(mydf, ifelse(a == 1, 
                                  "one", 
                                  ifelse(a == 2, 
                                         "two",
                                         "three or more")))

答案 2 :(得分:1)

sapply也可以完成这项工作:

mydf$encoded <- sapply(
    mydf$a, function(a) 
        if (a == 1) 'one' else if (a == 2) 'two' else 'three or more')
mydf
#   a            b       c       encoded
# 1 1 reproducible example           one
# 2 2 reproducible example           two
# 3 3 reproducible example three or more
# 4 4 reproducible example three or more
# 5 5 reproducible example three or more
# 6 6 reproducible example three or more