Question

我有一个包含500k行和大约130列的数据帧。我想过滤掉除一个列之外的所有列的重复行（第128列）。我试过了：

df <- unique(df[,-128])

df <- df[!duplicated(df[, -128])]

df <- distinct(df, -column128)

我一遍又一遍地得到同样的错误：

Error in paste(...............,  : formal argument "sep" matched by multiple actual arguments

我也尝试输入每一列，但得到了同样的错误。如果我对前9列尝试上述操作，则不会出现错误。但是，如果我为10列尝试相同，我会收到错误。删除重复行的列数是否有限制？或者有人有解决方案吗？

df如下所示（第128列=标签）：

    data.frame':    571262 obs. of  139 variables:
 $ x                      : num  1 1 1 1 0 0 0 7 7 7 ...
 $ jan                    : num  0 0 0 0 0 0 0 0 0 0 ...
 $ feb                    : num  0 0 0 0 0 0 0 0 0 0 ...
 $ mrt                    : num  0 0 0 0 0 0 0 0 0 0 ...
 $ apr                    : num  0 0 0 0 0 0 0 0 0 0 ...
 $ mei                    : num  0 0 0 0 0 0 0 0 0 0 ...
 $ jun                    : num  0 0 0 0 0 0 0 0 0 0 ...
 $ jul                    : num  0 0 0 0 0 0 0 0 0 0 ...
 $ aug                    : num  1 1 0 0 0 0 0 0 0 0 ...
 $ sep                    : num  0 0 1 1 0 0 0 0 0 0 ...
 $ okt                    : num  0 0 0 0 1 1 1 0 0 0 ...
 $ nov                    : num  0 0 0 0 0 0 0 1 1 1 ...
 $ dec                    : num  0 0 0 0 0 0 0 0 0 0 ...
 $ - 1                    : num  0 0 1 1 1 ...
 $ - 2                    : num  0 0 0 0 1 ...
 $ - 3                    : num  0 0 0 0 0 ...
 $ - 4                    : num  0 0 0 0 0 0 0 0 0 0 ...
......
 $ - 114                  : num  0 0 0 0 0 0 0 0 0 0 ...
 $ label                  : int  8 12 8 12 8 10 12 8 10 12 ...
 $ 2008                   : num  0 0 0 0 0 0 0 0 0 0 ...
 $ 2009                   : num  0 0 0 0 0 0 0 0 0 0 ...
 $ 2010                   : num  0 0 0 0 0 0 0 0 0 0 ...
 $ 2011                   : num  0 0 0 0 0 0 0 0 0 0 ...
 $ 2012                   : num  0 0 0 0 0 0 0 0 0 0 ...
 $ 2013                   : num  0 0 0 0 0 0 0 0 0 0 ...
 $ 2014                   : num  0 0 0 0 0 0 0 0 0 0 ...
 $ 2015                   : num  0 0 0 0 0 0 0 0 0 0 ...
 $ 2016                   : num  1 1 1 1 1 1 1 1 1 1 ...
 $ 2017                   : num  0 0 0 0 0 0 0 0 0 0 ...
 $ 2018                   : num  0 0 0 0 0 0 0 0 0 0 ...

Answer 1

看起来像是您的一个月的专栏＆＃39; sep＆＃39;正在与参数paste(..., sep) 发生碰撞。错误告诉你＆＃39;正式论证＆＃34; sep＆＃34;由多个实际参数匹配＆＃39; 。

不太可能有2个以上的列叫做＆＃39; sep＆＃39; ：检查which(names(df)=='sep')

解决方法是重命名您的列＆＃39; sep＆＃39;其他东西，例如＆＃39; SPT＆＃39;

Answer 2

您可以使用tidyverse功能尝试filter_at解决方案。

library(tidyverse)
set.seed(14)
df <- data.frame(a=sample(1:4, 5, T),b=sample(1:4, 5, T), d=1:5) 
df
  a b d
1 2 3 1
2 3 4 2
3 4 2 3
4 3 2 4
5 4 2 5
df %>% filter_at(vars(-3), all_vars(!duplicated(.)))
  a b d
1 2 3 1
2 3 4 2
3 4 2 3

R - 过滤大数据框中的重复行

2 个答案: