我有一个数据框:
col_1 <- c("A", "A", "B", "B", "C", "C")
col_2 <- c("A", "B", "C", "D", "E", "F")
col_3 <- c("A", "B", "C", "C", "B", "A")
df <- data.frame(col_1, col_2, col_3)
我想改变一个包含TRUE或FALSE的新列,具体取决于任何行是否有两个以上相同的条目。
e.g:
t_f <- c(TRUE, TRUE, TRUE, FALSE, FALSE, FALSE)
更好的是,如果我可以有一个包含重复值的列,例如:
name <- c("A", "B", "C", NA, NA, NA)
答案 0 :(得分:2)
首先要求
df$t_f <- apply(df, 1, function(x) any(duplicated(x)))
你的第二个
df$name <- apply(df, 1, function(x) ifelse(any(duplicated(x)), x[which(duplicated(x))], NA))
答案 1 :(得分:0)
你可以尝试
ifelse(colSums(table(row(df), as.matrix(df)) >= 2) == 1, colnames(table(row(df), as.matrix(df))), NA)
A B C D E F
"A" "B" "C" NA NA NA
在tidyverse中你可以做到
library(tidyverse)
df %>%
mutate_if(is.factor, as.character) %>%
rowwise() %>%
mutate(dup=anyDuplicated(c(col_1, col_2, col_3))!=0) %>%
mutate(which.dup=c(col_1, col_2, col_3)[which(duplicated(c(col_1, col_2, col_3)))[1]])
Source: local data frame [6 x 5]
Groups: <by row>
# A tibble: 6 x 5
col_1 col_2 col_3 dup which.dup
<chr> <chr> <chr> <lgl> <chr>
1 A A A TRUE A
2 A B B TRUE B
3 B C C TRUE C
4 B D C FALSE NA
5 C E B FALSE NA
6 C F A FALSE NA
答案 2 :(得分:0)
满足您的第二个要求:
col_1 <- c("A", "A", "B", "B", "C", "C")
col_2 <- c("A", "B", "C", "D", "E", "F")
col_3 <- c("A", "B", "C", "C", "B", "A")
df <- data.frame(col_1, col_2, col_3)
df$name <- apply(df, 1,
function(row)ifelse(max(table(row))>=2,
names(table(row))[which.max(table(row))], NA))
df
#> col_1 col_2 col_3 name
#> 1 A A A A
#> 2 A B B B
#> 3 B C C C
#> 4 B D C <NA>
#> 5 C E B <NA>
#> 6 C F A <NA>