如何在R中的数据帧中最后一次出现重复的字符串

时间:2015-04-14 12:40:35

标签: r

我的更新数据集如下

tmp_number_1    tmp_name_1         ID
7990918840  Yvette             33098
7958376552  Mum                33098
7951755055  Dad                33098
7951755055  Dad mob            33098
7581498864  Wynne Lewis        33098
7581498864  Wynne Lewis mob    33098
87128486    James Braithewaite 33098
1869353690  Fleetclaims        33098
447915381850    Kath               33098
919446540717    Sujata Egbert      33098
87124812    Chris  Riley       33098
7958376552  Mum Mob            33098
7958376552  Mum Mob new        33098

我想要记录的最后一行重复“tmp_number_1”。

我正在寻找答案

tmp_number_1    tmp_name_1           ID
7990918840  Yvette               33098
**7951755055    Dad mob**            33098
**7581498864    Wynne Lewis mob**    33098
87128486    James Braithewaite   33098
1869353690  Fleetclaims          33098
447915381850    Kath                 33098
919446540717    Sujata Egbert        33098
87124812    Chris  Riley         33098
**7958376552    Mum Mob new**        33098

**是“tmp_number_1”

的最后一次出现

3 个答案:

答案 0 :(得分:4)

你可以尝试

library(dplyr)
df1 %>% 
    group_by(tmp_number_1) %>% 
    slice(n())

或者

library(data.table)
setDT(df1)[, .SD[.N], tmp_number_1]

或者

setDT(df1)[df1[,seq_len(.N)==.N , tmp_number_1]$V1]

数据

df1 <- structure(list(tmp_number_1 = c(7990918840, 7958376552, 
7951755055, 
7951755055, 7581498864, 7581498864, 87128486, 1869353690, 
447915381850, 
919446540717, 87124812, 7958376552, 7958376552), 
tmp_name_1 = c("Yvette", 
"Mum", "Dad", "Dad mob", "Wynne Lewis", "Wynne Lewis mob",
"James Braithewaite", 
"Fleetclaims", "Kath", "Sujata Egbert", "Chris  Riley", "Mum Mob", 
"Mum Mob new")), .Names = c("tmp_number_1", "tmp_name_1"), 
class = "data.frame", row.names = c(NA, -13L))

答案 1 :(得分:3)

如果dfdata.frame,您可以尝试:

 library(data.table)

 setDT(df)[,tail(tmp_name_1,1),by=tmp_number_1]
#   tmp_number_1                 V1
#1:   7990918840             Yvette
#2:   7958376552        Mum Mob new
#3:   7951755055            Dad mob
#4:   7581498864    Wynne Lewis mob
#5:     87128486 James Braithewaite
#6:   1869353690        Fleetclaims
#7: 447915381850               Kath
#8: 919446540717      Sujata Egbert
#9:     87124812        Chris Riley

答案 2 :(得分:1)

这不是一个使用setkeyunique的好地方吗?借助更新的df1列从@akrun借用ID

编辑 - 根据@Arun建议,此处不需要使用setkey

library(data.table)
unique(setDT(df1), by="tmp_number_1", fromLast=TRUE)

   tmp_number_1         tmp_name_1    ID
1:   7990918840             Yvette 33098
2:   7951755055            Dad mob 33098
3:   7581498864    Wynne Lewis mob 33098
4:     87128486 James Braithewaite 33098
5:   1869353690        Fleetclaims 33098
6: 447915381850               Kath 33098
7: 919446540717      Sujata Egbert 33098
8:     87124812       Chris  Riley 33098
9:   7958376552        Mum Mob new 33098