我有一个表格列出了Id停止的地方(按时间排序)。
df <- structure(list(Location = c("enter_Skagen", "Nordjyllands-Vaerket",
"exit_Skagen", "enter_Skagen", "Nordjyllands-Vaerket", "exit_Skagen",
"enter_Skagen", "Nordjyllands-Vaerket", "exit_Skagen", "enter_Skagen",
"Nordjyllands-Vaerket", "exit_Skagen", "enter_Skagen", "Nordjyllands-Vaerket",
"exit_Skagen", "enter_Skagen", "Aarhus", "Fredericia", "Copenhagen",
"exit_Skagen"), Ship = c(8131180L, 8131180L, 8131180L, 8131180L,
8131180L, 8131180L, 8131180L, 8131180L, 8131180L, 8131180L, 8131180L,
8131180L, 8131180L, 8131180L, 8131180L, 8201674L, 8201674L, 8201674L,
8201674L, 8201674L)), .Names = c("Location", "Id"), class = "data.frame", row.names = c(61702L,
61698L, 61699L, 61703L, 61704L, 61705L, 61700L, 61707L, 61711L,
61697L, 61701L, 61710L, 61708L, 61709L, 61706L, 63055L, 63053L,
63045L, 63103L, 63159L))
我想有一个矩阵,计算每个Id不同位置之间的移动次数。作为第一步,我尝试将表格编辑为包含两列from
和to
。
我尝试按Id拆分,然后使用以下行进行转换:
spl <- split(df, df$Id)
move.spl <- lapply(spl, function(x) {
ret <- data.frame(from=head(df$Location, -1), to=tail(df$Location, -1),
#year=ceiling((head(x$year, -1)+tail(x$year, -1))/2),
#id=head(x$id, -1),
stringsAsFactors=FALSE)
})
moves <- rbindlist(move.spl)
它作为输出:
> moves
from to
1: enter_Skagen Nordjyllands-Vaerket
2: Nordjyllands-Vaerket exit_Skagen
3: exit_Skagen enter_Skagen
4: enter_Skagen Nordjyllands-Vaerket
5: Nordjyllands-Vaerket exit_Skagen
6: exit_Skagen enter_Skagen
7: enter_Skagen Nordjyllands-Vaerket
8: Nordjyllands-Vaerket exit_Skagen
9: exit_Skagen enter_Skagen
10: enter_Skagen Nordjyllands-Vaerket
11: Nordjyllands-Vaerket exit_Skagen
12: exit_Skagen enter_Skagen
13: enter_Skagen Nordjyllands-Vaerket
14: Nordjyllands-Vaerket exit_Skagen
15: exit_Skagen enter_Skagen
16: enter_Skagen Aarhus
17: Aarhus Fredericia
18: Fredericia Copenhagen
19: Copenhagen exit_Skagen
20: enter_Skagen Nordjyllands-Vaerket
21: Nordjyllands-Vaerket exit_Skagen
22: exit_Skagen enter_Skagen
23: enter_Skagen Nordjyllands-Vaerket
24: Nordjyllands-Vaerket exit_Skagen
25: exit_Skagen enter_Skagen
26: enter_Skagen Nordjyllands-Vaerket
27: Nordjyllands-Vaerket exit_Skagen
28: exit_Skagen enter_Skagen
29: enter_Skagen Nordjyllands-Vaerket
30: Nordjyllands-Vaerket exit_Skagen
31: exit_Skagen enter_Skagen
32: enter_Skagen Nordjyllands-Vaerket
33: Nordjyllands-Vaerket exit_Skagen
34: exit_Skagen enter_Skagen
35: enter_Skagen Aarhus
36: Aarhus Fredericia
37: Fredericia Copenhagen
38: Copenhagen exit_Skagen
from to
在第15行,它不应该是这样的,因为id是不同的。 在第15行之后,它对下一个Id表现良好,但在它完全变成香蕉之后。
(比我用
创建原点/目的地矩阵a <- table(moves$from, moves$to)
a <- data.table(a)
colnames(a) <- c("from","to", "N")
# create matrix
matrix <- dcast(a, from ~to, value.var = "N")
但这是最后一步)
moves
的结果有点奇怪。我不确定这个工作和分裂每个Id我想它只是将列表作为一个整体但不检查每个Id。
是否可以考虑Id?
获得更好的结果