R,创建包含第1列或满足条件的新列,第2列/第3列的值

时间:2016-01-28 09:52:08

标签: r dataframe data-cleaning

           a       b      c    d
1     boiler   maker   <NA> <NA> 
2      clerk assistant <NA> <NA> 
3     senior machine setter <NA> 
4   operated    <NA>   <NA> <NA> 
5 consultant    legal  <NA> <NA> 

如何创建一个新列,其中包含列&#39; a&#39;除非任何其他列包含legalassistant,否则会占用该值?

3 个答案:

答案 0 :(得分:5)

这是一个基础R解决方案。我们使用applyany一次测试每一列。

df$col <- as.character(df$a)
df$col[apply(df == "Legal",1,any)] <- "Legal"
df$col[apply(df == "assistant",1,any)] <- "assistant"

答案 1 :(得分:3)

试试这个:

library("dplyr")

df %>%
    mutate(new=ifelse(b=="Legal" | c=="Legal" | d=="Legal", "Legal",
                      ifelse(b=="assistant" | c=="assistant" | d=="assistant", "assistant",
                             as.character(a))))
如果值as.character,则需要

factors。如果没有,那就没必要了。

答案 2 :(得分:0)

@ scoa回答的基础R替代:

indx <- apply(mydf == "Legal",1,any) + apply(mydf == "assistant",1,any)*2 + 1L
mydf$col <- c("a","Legal","Assistent")[indx]

或一气呵成:

mydf$col <- c("a","Legal","Assistent")[apply(mydf == "Legal",1,any) + apply(mydf == "assistant",1,any)*2 + 1L]

给出:

> mydf
           a         b      c    d       col
1     boiler     maker   <NA> <NA>         a
2      clerk assistant   <NA> <NA> Assistent
3     senior   machine setter <NA>         a
4   operated      <NA>   <NA> <NA>         a
5 consultant     Legal   <NA> <NA>     Legal