如何从R中的表中获取重复的行

时间:2015-11-30 10:20:57

标签: r

Name Address Account    a   b      Amount   Phone
John CA     4879759  qwqe   rerter  203     807789747
Nil  FD     1234455  iuyui  jhgjhg  4321    98797897
Was  FR     8979696  yikjh  kkjhk   45989   9899999
Nil  FD     1234455  iuyui  jhgjhg  4321    98797897
John CA     4879759  qwqe   rerter  203     807789747
Saw  PO     9873279  kjljl  bjhjh   765     3543656
Nil  FD     1234455  iuyui  jhgjhg  4321    98797897
Aws  IL     707009   dfdsf  sasd    2344    242545
John CA     4879759  qwqe   rerter  203     807789747

我想借助R代码从此表中提取重复行。表名是“贷款”。我有170亿个订单项。主键列“名称,地址,帐户,金额,电话”。 伙计们我期待得到一些积极的解决方案。

在分离之后还有一件事我想以.csv格式保存该重复数据表。我是R的新人,请帮助我。

2 个答案:

答案 0 :(得分:1)

我们可以使用duplicated根据键列获取所有重复的行(' nm1')。

nm1 <- c("Name", "Address", "Account", "Amount", "Phone") 
df1[duplicated(df1[nm1])|duplicated(df1[nm1], fromLast=TRUE),]
# Name Address Account     a      b Amount     Phone
#1 John      CA 4879759  qwqe rerter    203 807789747
#2  Nil      FD 1234455 iuyui jhgjhg   4321  98797897
#4  Nil      FD 1234455 iuyui jhgjhg   4321  98797897
#5 John      CA 4879759  qwqe rerter    203 807789747
#7  Nil      FD 1234455 iuyui jhgjhg   4321  98797897
#9 John      CA 4879759  qwqe rerter    203 807789747

答案 1 :(得分:1)

Akrun答案的扩展,仅在重复检查中包含关键列:

mainCols = c("Name", "Address", "Account", "Amount", "Phone")
duplicatedRows = duplicated(loan[,mainCols])
duplicatedData = loan[duplicatedRows,]

# Name Address Account     a      b Amount     Phone
# 4  Nil      FD 1234455 iuyui jhgjhg   4321  98797897
# 5 John      CA 4879759  qwqe rerter    203 807789747
# 7  Nil      FD 1234455 iuyui jhgjhg   4321  98797897
# 9 John      CA 4879759  qwqe rerter    203 807789747