我想从包含一个列的数据框中提取某些行作为日期(列C)。这是一个小例子:
输出应如下所示:
Before <- data.frame(A=c("0010","0011","0012","0015","0024","0032","0032","0033","0039","0039","0039","0041","0054"),
B=c(11,12,11,11,12,12,12,11,"NA","NA",11,11,11),
C=c("2014-01-07","2013-06-03","2013-07-29","2014-07-14","2012-12-17","2013-08-21","2013-08-21","2014-07-11","2012-10-06","2012-10-06","2013-10-22","2014-05-28","2014-03-26"))
After <- data.frame(A=c("0010","0011","0012","0015","0024","0032","0033","0039","0041","0054"),
B=c(11,12,11,11,12,12,11,11,11,11),
C=c("2014-01-07","2013-06-03","2013-07-29","2014-07-14","2012-12-17","2013-08-21","2014-07-11","2013-10-22","2014-05-28","2014-03-26"))
我的目标是:
我无法使用子集,唯一等找到解决方案。任何帮助表示感谢!
答案 0 :(得分:1)
require(dplyr)
Before %>%
mutate(C=as.Date(C)) %>%
group_by(A) %>%
arrange(A,desc(C)) %>%
filter(row_number()==1)
#Source: local data frame [10 x 3]
#Groups: A
# A B C
#1 0010 11 2014-01-07
#2 0011 12 2013-06-03
#3 0012 11 2013-07-29
#4 0015 11 2014-07-14
#5 0024 12 2012-12-17
#6 0032 12 2013-08-21
#7 0033 11 2014-07-11
#8 0039 11 2013-10-22
#9 0041 11 2014-05-28
#10 0054 11 2014-03-26
答案 1 :(得分:1)
以下是两个data.table
变体,具体取决于对数据的假设:
假设您的数据已经包含每组A
的最新日期作为最后一个元素:
require(data.table)
setDT(Before)[, .SD[.N], by=A]
.SD
为S
中的每个组保留D
ubset A
ata,.N
保存该组中的观察数量。因此,.SD[.N]
为每个小组提供了最后一次观察。
没有任何假设:
require(data.table)
setDT(Before)[, C := as.Date(C)][, .SD[which.max(C)], by=A]
此处,首先我们使用C
的{{1}}运算符将as.Date(C)
替换为data.table
,该运算符通过引用修改 (不进行任何复制,因此快速+内存效率高)。然后,对于每个:=
数据子集,我们将行的子集对应于A
的最大值。
HTH
答案 2 :(得分:0)
通过使用日期像数字一样的事实,类似下面的内容可能会起到作用:
Before$C <- as.Date(Before$C) # Convert to dates
ans <- aggregate(C ~ A + B, max, data = Before) # Aggregate date, choose the last date
ans <- ans[ans$B != "NA", ] # Remove NA in col B
print(ans)
# A B C
#1 0010 11 2014-01-07
#2 0012 11 2013-07-29
#3 0015 11 2014-07-14
#4 0033 11 2014-07-11
#5 0039 11 2013-10-22
#6 0041 11 2014-05-28
#7 0054 11 2014-03-26
#8 0011 12 2013-06-03
#9 0024 12 2012-12-17
#10 0032 12 2013-08-21
max
类型的Date
将返回最新的{。}}。
答案 3 :(得分:0)
分裂申请-结合:
Before$C <- as.Date(Before$C)
library(plyr)
ddply(Before, .(A), function(df) {
df <- df[df$C==max(df$C),]
df[!duplicated(df),]
})
# A B C
#1 0010 11 2014-01-07
#2 0011 12 2013-06-03
#3 0012 11 2013-07-29
#4 0015 11 2014-07-14
#5 0024 12 2012-12-17
#6 0032 12 2013-08-21
#7 0033 11 2014-07-11
#8 0039 11 2013-10-22
#9 0041 11 2014-05-28
#10 0054 11 2014-03-26