Question

我正在尝试向现有的R dataframe添加一个新列，该列将根据相应行值中的值添加新列。如果值为1，则新列值应包含one，如果值为2，则新列值应包含two，否则为three or more

此代码：

mydf <- data.frame(a = 1:6, 
                   b = rep("reproducible", 6),
                   c = rep("example", 6), 
                   stringsAsFactors = FALSE)
mydf

呈现：

使用代码：

    mydf["encoded"] <- { if (mydf['a'] == 1) 'one' else if (mydf['a'] == 2) 'two' else 'three or more' }
mydf

呈现：

还会生成警告：

Warning message in if (mydf["a"] == 1) "one" else if (mydf["a"] == 2) "two" else "three or more":
“the condition has length > 1 and only the first element will be used”

新列已添加到dataframe，但所有值都相同：one

我没有实现正确添加新列值的逻辑？

Answer 1

执行此操作的一个被忽视的功能是cut功能：

mydf$encoded <- cut(mydf$a, c(0:2,Inf), c('one','two','three or more'))

结果：

> mydf
  a            b       c       encoded
1 1 reproducible example           one
2 2 reproducible example           two
3 3 reproducible example three or more
4 4 reproducible example three or more
5 5 reproducible example three or more
6 6 reproducible example three or more

Answer 2

使用dplyr::case_when的解决方案：

语法和逻辑不言自明：当a等于1时 - encoded等于＆＃34;一个＆＃34 ;;当a等于2时 - encoded等于＆＃34;两个＆＃34 ;;所有其他情况 - 编码等于＆＃34;三个或更多＆＃34; mutate只会创建一个新列。

library(dplyr)
mutate(mydf, encoded = case_when(a == 1 ~ "one",
                                 a == 2 ~ "two",
                                 TRUE ~ "three or more"))

  a            b       c       encoded
1 1 reproducible example           one
2 2 reproducible example           two
3 3 reproducible example three or more
4 4 reproducible example three or more
5 5 reproducible example three or more
6 6 reproducible example three or more

使用base::ifelse的解决方案：

mydf$encoded <- ifelse(mydf$a == 1, 
                       "one", 
                       ifelse(mydf$a == 2, 
                              "two",
                              "three or more"))

如果您不想多次撰写mydf$a，可以使用with：

mydf$encoded <- with(mydf, ifelse(a == 1, 
                                  "one", 
                                  ifelse(a == 2, 
                                         "two",
                                         "three or more")))

Answer 3

sapply也可以完成这项工作：

mydf$encoded <- sapply(
    mydf$a, function(a) 
        if (a == 1) 'one' else if (a == 2) 'two' else 'three or more')
mydf
#   a            b       c       encoded
# 1 1 reproducible example           one
# 2 2 reproducible example           two
# 3 3 reproducible example three or more
# 4 4 reproducible example three or more
# 5 5 reproducible example three or more
# 6 6 reproducible example three or more

将具有自定义值的列添加到dataframe

3 个答案: