我有一个数据框,其中包含用于多行多列的可能值“ c1”,“ c2”,“ c3”或“否”。任何给定的行都包含“ no”或其他值中的“ ”,也就是说,没有行同时包含c1和c2。
我想做的是创建一个新列,其中每行的 any 都包含非“ no”值,其中包含每行的非“ no”值;否则保持“否”。我认为这应该很简单,但我没有明白。
这里的示例数据...另存为“ test1.csv”
Group1,Group2,Group3,Group4,Group5,Group6
c1,no,no,c1,no,no
no,no,c1,no,no,no
no,no,no,no,c1,no
no,no,no,no,no,no
c1,no,no,no,no,c1
no,c1,no,no,no,no
c2,no,no,no,no,no
no,c2,no,c2,no,no
no,no,no,no,no,no
no,no,no,no,no,c2
c3,no,c3,no,c3,no
no,no,no,no,no,no
no,no,c3,c3,no,no
这是我尝试做的事情:
df <- read.csv("test1.csv")
df$any <- "no"
df$any[df == "c1"] <- "c1"
df$any[df == "c2"] <- "c2"
df$any[df == "c3"] <- "c3"
哪个返回以下错误:
Error in `$<-.data.frame`(`*tmp*`, any, value = c("c1", "no", "no", "no", :
replacement has 91 rows, data has 13
成功的输出应如下所示:
Group1 Group2 Group3 Group4 Group5 Group6 any
1 c1 no no c1 no no c1
2 no no c1 no no no c1
3 no no no no c1 no c1
4 no no no no no no no
5 c1 no no no no c1 c1
6 no c1 no no no no c1
7 c2 no no no no no c2
8 no c2 no c2 no no c2
9 no no no no c2 no c2
10 no no no no no no no
11 c3 no c3 no c3 no c3
12 no no no no no no no
13 no no c3 c3 no no c3
答案 0 :(得分:3)
使用max.col
,我们可以提取行中不是"no"
的第一个值。由于每一行都具有相同的非“否”值,因此这里的联系并不重要,也可以指定ties.method = "first"
来获取第一个非“否”值。
df$any <- df[cbind(1:nrow(df), max.col(df != "no"))]
df
# Group1 Group2 Group3 Group4 Group5 Group6 any
#1 c1 no no c1 no no c1
#2 no no c1 no no no c1
#3 no no no no c1 no c1
#4 no no no no no no no
#5 c1 no no no no c1 c1
#6 no c1 no no no no c1
#7 c2 no no no no no c2
#8 no c2 no c2 no no c2
#9 no no no no no no no
#10 no no no no no c2 c2
#11 c3 no c3 no c3 no c3
#12 no no no no no no no
#13 no no c3 c3 no no c3
答案 1 :(得分:1)
我们可以使用base R
方法
df1$any <- apply(df1, 1, function(x) x[x != 'no'][1])
df1$any[is.na(df1$any)] <- "no"
df1$any
#[1] "c1" "c1" "c1" "no" "c1" "c1" "c2" "c2" "c2" "no" "c3" "no" "c3"
或pmin
中带有base R
的另一个选项
df1$any <- do.call(pmin, df1)
df1$any
#[1] "c1" "c1" "c1" "no" "c1" "c1" "c2" "c2" "c2" "no" "c3" "no" "c3"
或与dplyr
library(dplyr)
df1 %>%
mutate(any = pmin(!!! rlang::syms(names(.))))
答案 2 :(得分:1)
将"no"
存储为缺失值可能是有意义的,在这种情况下,多余的列是所有其他列coalesce
d
library(dplyr)
df %>%
mutate_all(na_if, 'no') %>%
mutate(any = reduce(., coalesce))
# Group1 Group2 Group3 Group4 Group5 Group6 any
# 1 c1 <NA> <NA> c1 <NA> <NA> c1
# 2 <NA> <NA> c1 <NA> <NA> <NA> c1
# 3 <NA> <NA> <NA> <NA> c1 <NA> c1
# 4 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
# 5 c1 <NA> <NA> <NA> <NA> c1 c1
# 6 <NA> c1 <NA> <NA> <NA> <NA> c1
# 7 c2 <NA> <NA> <NA> <NA> <NA> c2
# 8 <NA> c2 <NA> c2 <NA> <NA> c2
# 9 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
# 10 <NA> <NA> <NA> <NA> <NA> c2 c2
# 11 c3 <NA> c3 <NA> c3 <NA> c3
# 12 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
# 13 <NA> <NA> c3 c3 <NA> <NA> c3