假设我的数据框看起来像这样:
row id event actor time
1 1 push dude 1
2 1 comment guy 2
3 1 comment guy 3
4 2 request person 1
5 2 comment person 2
6 2 merge dude 2
7 3 comment guy 3
8 3 comment dude 4
9 3 reject person 5
现在,假设我想使用以下规则将其转换为图形(边缘列表):从行n上的actor到行n + 1上的actor创建有向边,如果它们共享相同的ID。 E.g。
dude -> guy
(代表身份1)。guy -> person
(当第3行出现,第4行有人时,他们有不同的ID)。因此,我最终会得到一个看起来像这样的图表
from to time
dude guy 1-2
guy guy 2-3
person person 1-2
person dude 2
guy dude 3-4
dude person 4-5
如何在R代码中开始解决此问题?即使我开始这样做,我也迷失了。这将是有用的,因为它将有助于基于事件工作流数据构建社交网络。
就伪代码而言,我认为会是这样的:
for each rows n and n+1
if row n "id" = row n+1 "id"
store "actor" from row n in column "from"
store "actor" from row n+1 in column "to"
store "time" from row n in column "time"
unless "time" row n = "time" row n+1
append "time" from row n+1 in column "time"
else
move to next row
end
答案 0 :(得分:2)
这是一种快速的方法。我不确定它会有多强大。
library(plyr)
dat2 <- ddply(dat, .(id), function(d){
data.frame(
event = d$event[-1],
from = d$actor[-NROW(d)],
to = d$actor[-1],
time = paste(d$time[-NROW(d)], d$time[-1], sep = "-")
)
})
答案 1 :(得分:1)
这是一个data.table方式:
# make an edge list (pairs of nodes) with attributes
require(data.table)
DT <- data.table(DF)
gdt <- DT[,{
nodes <- actor # not unique(actor), strangely
list(
n1=head(nodes,-1),
n2=tail(nodes,-1),
t1=head(time,-1),
t2=tail(time,-1)
)},by=id]
# do annoying string processing
gdt[,
time:=do.call(paste,c(unique(c(t1,t2)),list(sep='-'))),
,by=1:nrow(gdt)][,
c('id','t1','t2'):=NULL
]
给出了
n1 n2 time
1: dude guy 1-2
2: guy guy 2-3
3: person person 1-2
4: person dude 2
5: guy dude 3-4
6: dude person 4-5
然后制作图表
require(igraph)
g <- graph.data.frame(gdt)