我的数据表id
,created_time
,amount
,balance
和created_month
中有5列。我想要每个id
和created_month
的最后一行,我试图将所有5列按created_month
和id
分组。
输入数据表 test
:
id created_time amount balance created_month
1 1/15/14 10:17 2 1 1/1/14
1 1/15/14 11:17 2 1 1/1/14
1 1/15/14 20:17 2 1 1/1/14
2 1/15/14 11:17 2 1 1/1/14
2 1/16/14 12:17 2 1 1/1/14
2 2/16/14 23:17 2 1 2/1/14
我按id
和created_time
排序为
setkeyv(test, c("id","created_time"))
我需要
下面的一个只能使我保持平衡,因为我的尾部选项只有一个字段
test[ , tail(balance,1L) , by=c("balanceable_id","created_month" )]
我不确定如何在尾部添加多个字段以显示原始表中的所有列。
我的目标是获取此数据表:
id created_month created_time amount balance
1 2014-01-01 2014-01-15 20:17:00 2 1
2 2014-01-01 2014-01-16 12:17:00 2 1
2 2014-02-01 2014-02-16 23:17:00 2 1
答案 0 :(得分:1)
其中一种方法可能是
library(data.table)
library(lubridate)
setDT(df)[, created_time := as.POSIXct(created_time, "%m/%d/%y %H:%M", tz = "GMT") #convert to timestamp format
][, created_month := floor_date(created_time, "month") #add a column having 1st day of created_time's month
][order(id, created_month)
][, .SD[.N], .(id, created_month)] #fetch last records
给出
id created_month created_time amount balance
1: 1 2014-01-01 2014-01-15 20:17:00 2 1
2: 2 2014-01-01 2014-01-16 12:17:00 2 1
3: 2 2014-02-01 2014-02-16 23:17:00 2 1
示例数据
df <- structure(list(id = c(1L, 1L, 1L, 2L, 2L, 2L), created_time = c("1/15/14 10:17",
"1/15/14 11:17", "1/15/14 20:17", "1/15/14 11:17", "1/16/14 12:17",
"2/16/14 23:17"), amount = c(2L, 2L, 2L, 2L, 2L, 2L), balance = c(1L,
1L, 1L, 1L, 1L, 1L)), .Names = c("id", "created_time", "amount",
"balance"), class = "data.frame", row.names = c(NA, -6L))
# id created_time amount balance
#1 1 1/15/14 10:17 2 1
#2 1 1/15/14 11:17 2 1
#3 1 1/15/14 20:17 2 1
#4 2 1/15/14 11:17 2 1
#5 2 1/16/14 12:17 2 1
#6 2 2/16/14 23:17 2 1