我有以下数据框:
id <- c(1,1,2,3,3)
date <- c("23-01-08","01-11-07","30-11-07","17-12-07","12-12-08")
df <- data.frame(id,date)
df$date2 <- as.Date(as.character(df$date), format = "%d-%m-%y")
id date date2
1 23-01-08 2008-01-23
1 01-11-07 2007-11-01
2 30-11-07 2007-11-30
3 17-12-07 2007-12-17
3 12-12-08 2008-12-12
现在我需要创建第四列,并在其中插入每个id
的最大交易日期。
决赛桌应如下:
id date date2 max
1 23-01-08 2008-01-23 2008-01-23
1 01-11-07 2007-11-01 0
2 30-11-07 2007-11-30 2007-11-30
3 17-12-07 2007-12-17 0
3 12-12-08 2008-12-12 2008-12-12
如果你能帮助我,我将感激不尽。
答案 0 :(得分:21)
id<-c(1,1,2,3,3)
date<-c("23-01-08","01-11-07","30-11-07","17-12-07","12-12-08")
df<-data.frame(id,date)
df$date2<-as.Date(as.character(df$date), format = "%d-%m-%y")
# aggregate can be used for this type of thing
d = aggregate(df$date2,by=list(df$id),max)
# And merge the result of aggregate
# with the original data frame
df2 = merge(df,d,by.x=1,by.y=1)
df2
id date date2 x
1 1 23-01-08 2008-01-23 2008-01-23
2 1 01-11-07 2007-11-01 2008-01-23
3 2 30-11-07 2007-11-30 2007-11-30
4 3 17-12-07 2007-12-17 2008-12-12
5 3 12-12-08 2008-12-12 2008-12-12
编辑:由于您希望当日期与最大日期不匹配时,最后一列为“空”,您可以尝试下一行。
df2[df2[,3]!=df2[,4],4]=NA
df2
id date date2 x
1 1 23-01-08 2008-01-23 2008-01-23
2 1 01-11-07 2007-11-01 <NA>
3 2 30-11-07 2007-11-30 2007-11-30
4 3 17-12-07 2007-12-17 <NA>
5 3 12-12-08 2008-12-12 2008-12-12
当然,清理colnames等总是很好,但我会留给你。
答案 1 :(得分:9)
另一种方法是使用plyr
包:
library(plyr)
ddply(df, "id", summarize, max = max(date2))
# id max
#1 1 2008-01-23
#2 2 2007-11-30
#3 3 2008-12-12
现在这不是您所使用的格式,因为它只显示每个id
一次。不要害怕,我们可以使用transform
代替summarize
:
ddply(df, "id", transform, max = max(date2))
# id date date2 max
#1 1 01-11-07 2007-11-01 2008-01-23
#2 1 23-01-08 2008-01-23 2008-01-23
#3 2 30-11-07 2007-11-30 2007-11-30
#4 3 12-12-08 2008-12-12 2008-12-12
#5 3 17-12-07 2007-12-17 2008-12-12
与@ seandavi的回答一样,这会重复每个max
的{{1}}日期。如果您想将重复项更改为id
,则可以执行以下操作:
NA
答案 2 :(得分:6)
如果有人正在寻找,请添加dplyr
解决方案:
library(dplyr)
df %>%
group_by(id) %>%
mutate(max = if_else(date2 == max(date2), date2, as.Date(NA)))
<强>结果:强>
# A tibble: 5 x 4
# Groups: id [3]
id date date2 max
<dbl> <fctr> <date> <date>
1 1 23-01-08 2008-01-23 2008-01-23
2 1 01-11-07 2007-11-01 NA
3 2 30-11-07 2007-11-30 2007-11-30
4 3 17-12-07 2007-12-17 NA
5 3 12-12-08 2008-12-12 2008-12-12
答案 3 :(得分:2)
library(sqldf)
tables<- '(SELECT * FROM df
)
AS t1,
(SELECT id,max(date2) date2 FROM df GROUP BY id
)
AS t2'
out<-fn$sqldf("SELECT t1.*,t2.date2 mdate FROM $tables WHERE t1.id=t2.id")
out$mdate<-as.Date(out$mdate)
out$mdate[out$date2!=out$mdate]<-NA
# id date date2 mdate
#1 1 01-11-07 2007-11-01 <NA>
#2 1 23-01-08 2008-01-23 2008-01-23
#3 2 30-11-07 2007-11-30 2007-11-30
#4 3 12-12-08 2008-12-12 2008-12-12
#5 3 17-12-07 2007-12-17 <NA>
答案 4 :(得分:2)
您不能将0用作日期值,因此您需要放弃将其保留为日期或接受NA值:
# Date values:
df$maxdt <- ave(df$date2, df$id,
FUN=function(x) ifelse( x == max(x), as.character(x), NA) )
str(ave(df$date2, df$id, FUN=function(x) ifelse( x == max(x), as.character(x), NA) ) )
# Date[1:5], format: "2008-01-23" NA "2007-11-30" NA "2008-12-12"
ifelse
机制执行一些奇怪的类型检查,只使用x
作为上面的第二个参数,但仍然返回Date-class向量。去搞清楚!下面是字符向量选项。
# Character values:
df$maxdt <- ave(as.character(df$date2), df$id,
FUN=function(x) ifelse( x == max(x), x, "0") )
ave(as.character(df$date2), df$id, FUN=function(x) ifelse( x == max(x), x, "0") )
[1] "2008-01-23" "0" "2007-11-30" "0" "2008-12-12"
答案 5 :(得分:0)
当我想查看列的最小/最大日期时,我发现这对您有帮助
最大值:<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<input type="text" class="someClass">
<input type="text" class="otherClass">
<input type="text" id="whatever">
最小值:head(df %>% distinct(date) %>% arrange(desc(date)))
最大值将按降序对日期列进行排序,从而使您可以看到最大值。分钟将以升序排序,以便您查看分钟。
您需要为此使用head(df %>% distinct(date) %>% arrange(date))
软件包。