从数据框中提取最新日期的行

时间:2014-07-29 14:11:55

标签: r

我想从包含一个列的数据框中提取某些行作为日期(列C)。这是一个小例子:

输出应如下所示:

Before <- data.frame(A=c("0010","0011","0012","0015","0024","0032","0032","0033","0039","0039","0039","0041","0054"),
                     B=c(11,12,11,11,12,12,12,11,"NA","NA",11,11,11),
                     C=c("2014-01-07","2013-06-03","2013-07-29","2014-07-14","2012-12-17","2013-08-21","2013-08-21","2014-07-11","2012-10-06","2012-10-06","2013-10-22","2014-05-28","2014-03-26"))

After <- data.frame(A=c("0010","0011","0012","0015","0024","0032","0033","0039","0041","0054"),
                    B=c(11,12,11,11,12,12,11,11,11,11),
                    C=c("2014-01-07","2013-06-03","2013-07-29","2014-07-14","2012-12-17","2013-08-21","2014-07-11","2013-10-22","2014-05-28","2014-03-26"))

我的目标是:

  1. 仅提供具有最新日期的条目(第9,10,11行之前(之前)) - &gt;只排出第8行(后)
  2. 只发出一次相同的条目(第6行和第7行(之前)) - &gt;只给出第6行(后)
  3. 我无法使用子集,唯一等找到解决方案。任何帮助表示感谢!

4 个答案:

答案 0 :(得分:1)

require(dplyr)
Before %>%
  mutate(C=as.Date(C)) %>%
  group_by(A) %>%
  arrange(A,desc(C)) %>%
  filter(row_number()==1)

#Source: local data frame [10 x 3]
#Groups: A

#      A  B          C
#1  0010 11 2014-01-07
#2  0011 12 2013-06-03
#3  0012 11 2013-07-29
#4  0015 11 2014-07-14
#5  0024 12 2012-12-17
#6  0032 12 2013-08-21
#7  0033 11 2014-07-11
#8  0039 11 2013-10-22
#9  0041 11 2014-05-28
#10 0054 11 2014-03-26

答案 1 :(得分:1)

以下是两个data.table变体,具体取决于对数据的假设:

  • 假设您的数据已经包含每组A的最新日期作为最后一个元素:

    require(data.table)
    setDT(Before)[, .SD[.N], by=A]
    

.SDS中的每个组保留D ubset A ata,.N保存该组中的观察数量。因此,.SD[.N]为每个小组提供了最后一次观察。

  • 没有任何假设:

    require(data.table)
    setDT(Before)[, C := as.Date(C)][, .SD[which.max(C)], by=A]
    

此处,首先我们使用C的{​​{1}}运算符将as.Date(C)替换为data.table,该运算符通过引用修改 (不进行任何复制,因此快速+内存效率高)。然后,对于每个:=数据子集,我们将行的子集对应于A的最大值。

HTH

答案 2 :(得分:0)

通过使用日期像数字一样的事实,类似下面的内容可能会起到作用:

Before$C <- as.Date(Before$C)  # Convert to dates
ans <- aggregate(C ~ A + B, max, data = Before)  # Aggregate date, choose the last date
ans <- ans[ans$B != "NA", ]  # Remove NA in col B
print(ans)
#      A  B          C
#1  0010 11 2014-01-07
#2  0012 11 2013-07-29
#3  0015 11 2014-07-14
#4  0033 11 2014-07-11
#5  0039 11 2013-10-22
#6  0041 11 2014-05-28
#7  0054 11 2014-03-26
#8  0011 12 2013-06-03
#9  0024 12 2012-12-17
#10 0032 12 2013-08-21

max类型的Date将返回最新的{。}}。

答案 3 :(得分:0)

分裂申请-结合:

Before$C <- as.Date(Before$C)
library(plyr)
ddply(Before, .(A), function(df) {
  df <- df[df$C==max(df$C),]
  df[!duplicated(df),]
  })

#      A  B          C
#1  0010 11 2014-01-07
#2  0011 12 2013-06-03
#3  0012 11 2013-07-29
#4  0015 11 2014-07-14
#5  0024 12 2012-12-17
#6  0032 12 2013-08-21
#7  0033 11 2014-07-11
#8  0039 11 2013-10-22
#9  0041 11 2014-05-28
#10 0054 11 2014-03-26