Question

我有一个小标题，有1000多个列和数十万行。我想摆脱重复的值，同时保持每一行的唯一ID值。这是我尝试使用mtcars的简化版本。

library(tidyverse)

mtcars %>% 
  as_tibble() %>% 
  rownames_to_column() %>% 
  distinct(mpg:carb, .keep_all = TRUE)

#Error in mutate_impl(.data, dots) : 
#  Column `mpg:carb` must be length 32 (the number of rows) or one, not 18
#In addition: Warning messages:
#1: In mpg:carb : numerical expression has 32 elements: only the first used
#2: In mpg:carb : numerical expression has 32 elements: only the first used

任何想法如何在保持ID变量的同时删除非唯一行？在mtcars示例中，ID变量为rownames。对于我来说，有太多列无法单独键入。

Answer 1

df_filtered<-df[!duplicated(df[,-1]),]

（假设ID列是第一个）。它的作用是为您提供数据框的子集（df），其中仅包含那些行，其中除第一列之外的整个行都不是上一行的重复。

在保留ID变量的同时告诉哪些行是不同的

1 个答案: