我有这个样本:
> a
Ship duration.minutes event Location
1 a NA enter Skagen
2 a 1616 trip <NA>
3 a 4308 stop Copenhagen
4 b 1646 trip <NA>
5 b 5751 stop Gdynia
6 b 75 trip <NA>
7 b 45666 stop Gdansk
8 c 2531 trip <NA>
9 c 5360 stop Szczecin
10 d 287 trip <NA>
我想添加一个称为“目的地”的新列,并在这些单元格中添加目的地的名称。
输出为:
> output
Ship duration.minutes event Location Destination
1 a NA enter Skagen NA
2 a 1616 trip <NA> Copenhagen
3 a 4308 stop Copenhagen <NA>
4 b 1646 trip <NA> Gdynia
5 b 5751 stop Gdynia <NA>
6 b 75 trip <NA> Gdansk
7 b 45666 stop Gdansk <NA>
8 c 2531 trip <NA> Szczecin
9 c 5360 stop Szczecin <NA>
10 d 287 trip <NA> <NA>
这意味着它在每艘船上都在工作:它将只给该船的目的地。旅行后,这艘船将驶向下一个位置。
我尝试使用moves <- setDT(a)[, .(from = Location[-.N], to = Location[-1L]) , Ship]
,但它没有保留列duration.minutes
:
> dput(moves)
structure(list(Ship = c("a", "a", "b", "b", "b", "c"), from = structure(c(4L,
NA, NA, 3L, NA, NA), .Label = c("Copenhagen", "Gdansk", "Gdynia",
"Skagen", "Szczecin"), class = "factor"), to = structure(c(NA,
1L, 3L, NA, 2L, 5L), .Label = c("Copenhagen", "Gdansk", "Gdynia",
"Skagen", "Szczecin"), class = "factor")), row.names = c(NA,
-6L), class = c("data.table", "data.frame"), .Names = c("Ship",
"from", "to"), .internal.selfref = <pointer: 0x00000000003e0788>)
它看起来像这样:
> moves
Ship from to
1: a Skagen <NA>
2: a <NA> Copenhagen
3: b <NA> Gdynia
4: b Gdynia <NA>
5: b <NA> Gdansk
6: c <NA> Szczecin
名为a的数据示例为:
> dput(data)
structure(list(Ship = c("a", "a", "a", "b", "b", "b", "b", "c",
"c", "d"), duration.minutes = c(NA, 1616L, 4308L, 1646L, 5751L,
75L, 45666L, 2531L, 5360L, 287L), event = structure(c(1L, 3L,
2L, 3L, 2L, 3L, 2L, 3L, 2L, 3L), .Label = c("enter", "stop",
"trip"), class = "factor"), Location = structure(c(4L, NA, 1L,
NA, 3L, NA, 2L, NA, 5L, NA), .Label = c("Copenhagen", "Gdansk",
"Gdynia", "Skagen", "Szczecin"), class = "factor")), .Names = c("Ship",
"duration.minutes", "event", "Location"), row.names = c(NA, -10L
), class = c("data.table", "data.frame"))
恐怕使用setDT很难。有没有办法保持列的duration.minutes?
答案 0 :(得分:0)
我不确定这是否涵盖您的所有用例,但是您可以使用lead
函数为每个Ship
捕获下一个值。将所有值都放在一个单独的列中而不是在单独的Location
和Destination
列中似乎更有意义。
library(tidyverse)
a %>%
group_by(Ship) %>%
mutate(Destination = lead(Location),
Location = coalesce(Location, Destination)) %>%
select(-Destination)
Ship duration.minutes event Location <chr> <int> <fct> <fct> 1 a NA enter Skagen 2 a 1616 trip Copenhagen 3 a 4308 stop Copenhagen 4 b 1646 trip Gdynia 5 b 5751 stop Gdynia 6 b 75 trip Gdansk 7 b 45666 stop Gdansk 8 c 2531 trip Szczecin 9 c 5360 stop Szczecin 10 d 287 trip <NA>
如果要保留单独的列,则可以将代码缩短为:
a %>%
group_by(Ship) %>%
mutate(Destination = lead(Location))
对于您提供的数据样本,fill
也可以一步创建一个单列:
a %>%
group_by(Ship) %>%
fill(Location, .direction="up")