检查整个数据框是否重复

时间:2018-11-07 17:37:56

标签: r dataframe

我有下面的代码来查找整个数据帧中具有相同值的行。 但是,如果数据帧仅包含一列数据,则不应运行它。 仅在1列数据上使用它会覆盖数据框,因为所有内容都是唯一的。 我一直试图包括一个if语句,该语句首先检查列数。但是我遇到了下面的错误。

#Dataframe with only 1 column of data     
final_table <- as.data.frame(c("a","b","c","d","e"))

#Remove rows where all values are the same if dataframe ncol >1.
final_table <- final_table %>% 
  if(ncol(final_table) > 1)  
     {filter(apply(., 1, function(x) length(unique(x)) > 1))}

#Error in if (.) ncol(final_table) > 1 else { : argument is not interpretable as logical
df1 <- data.frame("x1" = c("a","b","c","d","e"),
                  "x2" = c("c","b","c","x","e"),
                  "x3" = c("a","b","t","s","e"))

df1 %>%
   filter(apply(., 1, function(x) length(unique(x)) > 1 ))

#df1  before when run on multiple columns

  x1 x2 x3
1  a  c  a
2  b  b  b
3  c  c  t
4  d  x  s
5  e  e  e

#after when run on multiple columns (correct results)
  x1 x2 x3
1  a  c  a
2  c  c  t
3  d  x  s

#before if run on df with 1 column
1 a
2 b
3 c
4 d
5 e
#after if run on df with 1 column (results not correct)
#I need to insert a conditional statement that checks if ncol > 1
#If ncol == 1 then I don't want to run the function

NA

2 个答案:

答案 0 :(得分:1)

这是一种清理逻辑的方法-

rm_dup_col <- function(df) {
  if(ncol(df) > 1) {
    return(
      df[apply(df, 1, function(x) length(unique(x)) > 1), ]
    )
  }
  return(df)
}

df1 %>%
  rm_dup_col()

  x1 x2 x3
1  a  c  a
2  c  c  t
3  d  x  s

final_table %>%
  rm_dup_col()

  A
1 a
2 b
3 c
4 d
5 e

答案 1 :(得分:0)

尝试一下:

if(ncol(final_table) > 1) {final_table <- final_table %>% filter(apply(., 1, function(x) length(unique(x)) > 1))}