我遇到了这个问题。
我有一个数据框(日期),其中一些文档ID和日期存储在一个字符向量中:
Doc Dates
1 12345 c("06/01/2000","08/09/2002")
2 23456 c("07/01/2000", 09/08/2003", "07/01/2000")
3 34567 c("09/06/2004", "09/06/2004", "12/30/2006")
4 45678 c("06/01/2000","08/09/2002")
我正在尝试删除日期中的重复元素以获得此结果:
Doc Dates
1 12345 c("06/01/2000","08/09/2002")
2 23456 c("07/01/2000", 09/08/2003")
3 34567 c("09/06/2004", "12/30/2006")
4 45678 c("06/01/2000","08/09/2002")
我试过了:
R>unique(dates$dates)
但它会按日期删除重复的行:
Doc Dates
1 12345 c("06/01/2000","08/09/2002")
2 23456 c("07/01/2000", 09/08/2003")
3 34567 c("09/06/2004", "12/30/2006")
有关如何仅删除日期中重复元素的任何帮助,而不是按日期删除重复的行?
* * 已更新数据
# Match some text string (dates) from some text:
df1$dates <- as.character(strapply(df1[[2]], "((\\D\\d{1,2}(/|-)\\d{1,2}(/|-)\\d{2,4})| ([^/]\\d{1,2}(/|-)\\d{2,4})|((JAN|FEB|MAR|APR|MAY|JUN|JUL|AUG|SEP|OCT|NOV){1}[\\s|-]{0,2}\\d{1,4}(\\D[\\s|-]{0,}\\d{2,4}){0,}))"))
# Drop first 2 columns from dataframe
df2<-df1[ -c(1,2)]
# List data
>df2
872 7/23/2007
873 c(" 11/4/2007", " 11/4/2007")
874 c(" 4/2/2008", " 8/2/2007")
880 11/14/2006
> class(df2)
[1] "data.frame"
> class(df2$dates)
[1] "character"
> dput(df2)
structure(list(dates = c("NULL", "NULL", " 7/23/2007", "c(\" 11/4/2007\", \" 11/4/2007\")",
"c(\" 4/2/2008\", \" 8/2/2007\")", "NULL", "NULL", "NULL", "NULL",
"NULL", " 11/14/2006")), .Names = "dates", class = "data.frame", row.names = 870:880)
所以我的问题是如何摆脱第873行中的重复日期?
答案 0 :(得分:1)
试试这个:
within(dates, Dates <- lapply(Dates, unique))
答案 1 :(得分:1)
我解决了从字符向量中删除重复值的问题 - 包装lapply(strapply(),唯一):
df1$date <- as.character(lapply((strapply(df1[[2]], "((\\D\\d{1,2}(/|-)\\d{1,2}(/|- )\\d{2,4})|(\\s\\d{1,2}(/|-)\\d{2,4})|((JAN|FEB|MAR|APR|MAY|JUN|JUL|AUG|SEP|OCT|NOV){1}[\\s|-]{0,2}\\d{1,4}(\\D[\\s|-]{0,}\\d{2,4}){0,}))")),unique))
感谢您的帮助。
答案 2 :(得分:0)
我会在日期中gsub
c(
和)
,然后我会使用{{unique
在strsplit
上调用,
。 1}}
UNTESTED但可能是这样的:
sapply(dates$dates, function(x){
new.x=gsub("c(|)","",x)
new.x=strsplit(new.x, ",")
unique(new.x)
})
答案 3 :(得分:0)
您可能正在寻找类似的东西。
df
Doc Dates
1 12345 c("06/01/2000","08/09/2002")
2 23456 c("07/01/2000", "09/08/2003", "07/01/2000")
3 34567 c("09/06/2004", "09/06/2004", "12/30/2006")
4 45678 c("06/01/2000","08/09/2002")
Eval and Parse
x <- t(sapply(df[,"Dates"],function(x){unique(eval(parse(text = x)))}))
df$Dates <- paste(x[,1],x[,2],sep=",")
df
Doc Dates
1 12345 06/01/2000,08/09/2002
2 23456 07/01/2000,09/08/2003
3 34567 09/06/2004,12/30/2006
4 45678 06/01/2000,08/09/2002
Same can be achieved using Regex:
paste(unique(unlist(strsplit(gsub("c\\(|\\)","",'c("24/07/2012","22/01/2012","24/07/2012")'),","))),sep = "")
[1] "\"24/07/2012\"" "\"22/01/2012\""
Haven't tried on data but works