我正在使用几个大型数据框,需要通过船和网将数据分类到第一个和最后一个条目。我的数据框如下所示:
Boat Net DateTime
Dawn 71 2014-07-10 10:10
Dawn 71 2014-07-15 11:10
Whip 71 2014-07-17 08:10
Whip 71 2014-07-29 12:36
Dawn 71 2014-08-24 14:53
Whip 71 2014-09-02 11:17
Whip 73 2014-09-14 16:24
Whip 71 2014-09-15 18:16
Whip 73 2014-09-17 20:25
我需要数据框只包含每个网络的第一个和最后一个条目。数据应如下所示:
Boat Net DateTime
Dawn 71 2014-07-10 10:10
Whip 71 2014-07-17 08:10
Dawn 71 2014-08-24 14:53
Whip 73 2014-09-14 16:24
Whip 71 2014-09-15 18:16
Whip 73 2014-09-17 20:25
我尝试了几种不同的东西并且接近但不完全在那里。
Head <- aggregate(df, by = list(df$Net), FUN = head, n = 1)
Tail <- aggregate(df, by = list(df$Net), FUN = tail, n = 1)
Final <- rbind(Head, Tail)
这很好但是没有考虑到不同船上的相同网号,然后我尝试乘船分组但得到了相同的结果:
Head <- df %>% group_by(Boat) %>% aggregate(df, by = list(df$Net), FUN = head, n = 1) %>% ungroup
这两个函数都返回了以下数据:(仅限网络编号的第一个和最后一个条目)
Boat Net DateTime
Dawn 71 2014-07-10 10:10
Whip 73 2014-09-14 16:24
Whip 71 2014-09-15 18:16
Whip 73 2014-09-17 20:25
我认为我很接近但不能完全到达那里,我们将非常感谢任何帮助。
答案 0 :(得分:3)
对于汇总方法,您可以通过向df$Boat
提供df$Net
和aggregate
来获得所需内容:
Head <- aggregate(df, by = list(df$Boat, df$Net), FUN = head, n = 1)
Tail <- aggregate(df, by = list(df$Boat, df$Net), FUN = tail, n = 1)
Final <- rbind(Head, Tail)
由于您尝试使用dplyr&#39; group_by
,这里是一个dplyr替代方案,它按组使用slice
:
Final <- df %>%
group_by(Boat, Net) %>%
slice(c(1, n())) %>%
ungroup()
(请注意group_by
和aggregate
在组合中不做任何特别的事情 - group_by
仅适用于其他dplyr函数,例如slice
,{{1} },或summarize
)。
答案 1 :(得分:1)
do.call(rbind, lapply(split(df, paste(df$Boat, df$Net, sep = "-")),
function(a) a[c(1, NROW(a)),]))
# Boat Net DateTime
#Dawn-71.1 Dawn 71 2014-07-10 10:10
#Dawn-71.5 Dawn 71 2014-08-24 14:53
#Whip-71.3 Whip 71 2014-07-17 08:10
#Whip-71.8 Whip 71 2014-09-15 18:16
#Whip-73.7 Whip 73 2014-09-14 16:24
#Whip-73.9 Whip 73 2014-09-17 20:25
数据强>
df = structure(list(Boat = c("Dawn", "Dawn", "Whip", "Whip", "Dawn",
"Whip", "Whip", "Whip", "Whip"), Net = c(71L, 71L, 71L, 71L,
71L, 71L, 73L, 71L, 73L), DateTime = c("2014-07-10 10:10", "2014-07-15 11:10",
"2014-07-17 08:10", "2014-07-29 12:36", "2014-08-24 14:53", "2014-09-02 11:17",
"2014-09-14 16:24", "2014-09-15 18:16", "2014-09-17 20:25")), .Names = c("Boat",
"Net", "DateTime"), class = "data.frame", row.names = c(NA, -9L
))
答案 2 :(得分:0)
以下是data.table
library(data.table)
setDT(df)[, .SD[c(1, .N)], .(Boat, Net)]
# Boat Net DateTime
#1: Dawn 71 2014-07-10 10:10
#2: Dawn 71 2014-08-24 14:53
#3: Whip 71 2014-07-17 08:10
#4: Whip 71 2014-09-15 18:16
#5: Whip 73 2014-09-14 16:24
#6: Whip 73 2014-09-17 20:25