根据多列中任何一列的字符串匹配在一列中创建值

时间:2019-05-22 14:30:58

标签: r

我有一个数据框,其中包含用于多行多列的可能值“ c1”,“ c2”,“ c3”或“否”。任何给定的行都包含“ no”或其他值中的“ ”,也就是说,没有行同时包含c1和c2。

我想做的是创建一个新列,其中每行的 any 都包含非“ no”值,其中包含每行的非“ no”值;否则保持“否”。我认为这应该很简单,但我没有明白。

这里的示例数据...另存为“ test1.csv”

Group1,Group2,Group3,Group4,Group5,Group6
c1,no,no,c1,no,no
no,no,c1,no,no,no
no,no,no,no,c1,no
no,no,no,no,no,no
c1,no,no,no,no,c1
no,c1,no,no,no,no
c2,no,no,no,no,no
no,c2,no,c2,no,no
no,no,no,no,no,no
no,no,no,no,no,c2
c3,no,c3,no,c3,no
no,no,no,no,no,no
no,no,c3,c3,no,no

这是我尝试做的事情:

df <- read.csv("test1.csv")
df$any <- "no"
df$any[df == "c1"] <- "c1"
df$any[df == "c2"] <- "c2"
df$any[df == "c3"] <- "c3"

哪个返回以下错误:

Error in `$<-.data.frame`(`*tmp*`, any, value = c("c1", "no", "no", "no",  : 
  replacement has 91 rows, data has 13

成功的输出应如下所示:

   Group1 Group2 Group3 Group4 Group5 Group6 any
1      c1     no     no     c1     no     no  c1
2      no     no     c1     no     no     no  c1
3      no     no     no     no     c1     no  c1
4      no     no     no     no     no     no  no
5      c1     no     no     no     no     c1  c1
6      no     c1     no     no     no     no  c1
7      c2     no     no     no     no     no  c2
8      no     c2     no     c2     no     no  c2
9      no     no     no     no     c2     no  c2
10     no     no     no     no     no     no  no
11     c3     no     c3     no     c3     no  c3
12     no     no     no     no     no     no  no
13     no     no     c3     c3     no     no  c3

3 个答案:

答案 0 :(得分:3)

使用max.col,我们可以提取行中不是"no"的第一个值。由于每一行都具有相同的非“否”值,因此这里的联系并不重要,也可以指定ties.method = "first"来获取第一个非“否”值。

df$any <- df[cbind(1:nrow(df), max.col(df != "no"))]

df
#   Group1 Group2 Group3 Group4 Group5 Group6 any
#1      c1     no     no     c1     no     no  c1
#2      no     no     c1     no     no     no  c1
#3      no     no     no     no     c1     no  c1
#4      no     no     no     no     no     no  no
#5      c1     no     no     no     no     c1  c1
#6      no     c1     no     no     no     no  c1
#7      c2     no     no     no     no     no  c2
#8      no     c2     no     c2     no     no  c2
#9      no     no     no     no     no     no  no
#10     no     no     no     no     no     c2  c2
#11     c3     no     c3     no     c3     no  c3
#12     no     no     no     no     no     no  no
#13     no     no     c3     c3     no     no  c3

答案 1 :(得分:1)

我们可以使用base R方法

df1$any <- apply(df1, 1, function(x) x[x != 'no'][1])
df1$any[is.na(df1$any)] <- "no"
df1$any
#[1] "c1" "c1" "c1" "no" "c1" "c1" "c2" "c2" "c2" "no" "c3" "no" "c3"

pmin中带有base R的另一个选项

df1$any <- do.call(pmin, df1)
df1$any
#[1] "c1" "c1" "c1" "no" "c1" "c1" "c2" "c2" "c2" "no" "c3" "no" "c3"

或与dplyr

library(dplyr)
df1 %>% 
   mutate(any = pmin(!!! rlang::syms(names(.))))

答案 2 :(得分:1)

"no"存储为缺失值可能是有意义的,在这种情况下,多余的列是所有其他列coalesce d

library(dplyr)

df %>% 
  mutate_all(na_if, 'no') %>% 
  mutate(any = reduce(., coalesce))

#    Group1 Group2 Group3 Group4 Group5 Group6  any
# 1      c1   <NA>   <NA>     c1   <NA>   <NA>   c1
# 2    <NA>   <NA>     c1   <NA>   <NA>   <NA>   c1
# 3    <NA>   <NA>   <NA>   <NA>     c1   <NA>   c1
# 4    <NA>   <NA>   <NA>   <NA>   <NA>   <NA> <NA>
# 5      c1   <NA>   <NA>   <NA>   <NA>     c1   c1
# 6    <NA>     c1   <NA>   <NA>   <NA>   <NA>   c1
# 7      c2   <NA>   <NA>   <NA>   <NA>   <NA>   c2
# 8    <NA>     c2   <NA>     c2   <NA>   <NA>   c2
# 9    <NA>   <NA>   <NA>   <NA>   <NA>   <NA> <NA>
# 10   <NA>   <NA>   <NA>   <NA>   <NA>     c2   c2
# 11     c3   <NA>     c3   <NA>     c3   <NA>   c3
# 12   <NA>   <NA>   <NA>   <NA>   <NA>   <NA> <NA>
# 13   <NA>   <NA>     c3     c3   <NA>   <NA>   c3