在旧数据框中,按两列排序并将每行的第一行存储到新数据框中

时间:2016-07-18 16:46:39

标签: r

我有一个包含3列的数据框,我想使用列datelocation来获取每个位置的最新观察结果并将其存储到新数据框中。< / p>

> old.data
date location amount
2014 NY       1
2015 NJ       2
2016 NY       3
2015 NM       4
2013 NY       5
2014 NJ       6
2016 NM       7
2016 NJ       8
2015 NY       9

> new.data
date location amount
2016 NJ       8
2016 NM       7
2016 NY       3

3 个答案:

答案 0 :(得分:4)

使用int

dplyr

使用library(dplyr) new.data <- old.data %>% arrange(desc(date), location) %>% group_by(location) %>% slice(1) new.data Source: local data frame [3 x 2] Groups: location [3] date location <int> <fctr> 1 2016 NJ 2 2016 NM 3 2016 NY

data.table

数据

library(data.table)
# Code updated by Arun
setDT(old.data)[order(-date, location), .(date = date[1L]), by = location]

   location date
1:       NJ 2016
2:       NM 2016
3:       NY 2016

更新(因为OP更改了原始数据框)

old.data <- structure(list(date = c(2014L, 2015L, 2016L, 2015L, 2013L, 2014L, 2016L, 2016L, 2015L), location = structure(c(3L, 1L, 3L, 2L, 3L, 1L, 2L, 1L, 3L), .Label = c("NJ", "NM", "NY"), class = "factor")), .Names = c("date", "location"), class = "data.frame", row.names = c(NA, -9L)) 解决方案仍然有效。

对于dplyr,这是我能想到的唯一方法:

data.table

Arun

的建议使用 .SD .SDcols
setDT(old.data)[order(-date, location), colnames(old.data), with = F][date == max(date)]

   date location amount  
1: 2016       NJ      8 
2: 2016       NM      7
3: 2016       NY      3

答案 1 :(得分:2)

这个怎么样:

library(dplyr)
date <- c(2014, 2015, 2016, 2015, 2013, 2014, 2016, 2016, 2015)
location <- c("NY", "NJ", "NY", "NM", "NY", "NJ", "NM", "NJ", "NY")

old.data <- data.frame(date, location)

new.data <- group_by(old.data, location)

new.data <- summarise(new.data, year = max(date))

答案 2 :(得分:2)

使用data.table包:

library(data.table)
setDT(dat)[order(-date), .SD[1L], by = location]
#    location date
# 1:       NY 2016
# 2:       NM 2016
# 3:       NJ 2016