Name Address Account a b Amount Phone
John CA 4879759 qwqe rerter 203 807789747
Nil FD 1234455 iuyui jhgjhg 4321 98797897
Was FR 8979696 yikjh kkjhk 45989 9899999
Nil FD 1234455 iuyui jhgjhg 4321 98797897
John CA 4879759 qwqe rerter 203 807789747
Saw PO 9873279 kjljl bjhjh 765 3543656
Nil FD 1234455 iuyui jhgjhg 4321 98797897
Aws IL 707009 dfdsf sasd 2344 242545
John CA 4879759 qwqe rerter 203 807789747
我想借助R代码从此表中提取重复行。表名是“贷款”。我有170亿个订单项。主键列“名称,地址,帐户,金额,电话”。 伙计们我期待得到一些积极的解决方案。
在分离之后还有一件事我想以.csv格式保存该重复数据表。我是R的新人,请帮助我。
答案 0 :(得分:1)
我们可以使用duplicated
根据键列获取所有重复的行(' nm1')。
nm1 <- c("Name", "Address", "Account", "Amount", "Phone")
df1[duplicated(df1[nm1])|duplicated(df1[nm1], fromLast=TRUE),]
# Name Address Account a b Amount Phone
#1 John CA 4879759 qwqe rerter 203 807789747
#2 Nil FD 1234455 iuyui jhgjhg 4321 98797897
#4 Nil FD 1234455 iuyui jhgjhg 4321 98797897
#5 John CA 4879759 qwqe rerter 203 807789747
#7 Nil FD 1234455 iuyui jhgjhg 4321 98797897
#9 John CA 4879759 qwqe rerter 203 807789747
答案 1 :(得分:1)
Akrun答案的扩展,仅在重复检查中包含关键列:
mainCols = c("Name", "Address", "Account", "Amount", "Phone")
duplicatedRows = duplicated(loan[,mainCols])
duplicatedData = loan[duplicatedRows,]
# Name Address Account a b Amount Phone
# 4 Nil FD 1234455 iuyui jhgjhg 4321 98797897
# 5 John CA 4879759 qwqe rerter 203 807789747
# 7 Nil FD 1234455 iuyui jhgjhg 4321 98797897
# 9 John CA 4879759 qwqe rerter 203 807789747