我有以下数据框:
current_session next_session
1 811 841
2 1771 2071
3 3181 3241
4 3241 3271
5 3271 3361
6 3361 3391
我需要为每个大于2个元素的会话链构建列表,其中一系列此类元素由“next_session”与下一行的“current_session”之间的链接定义。例如,可以从上述集合中抽取的1个链是(3181,3241,3271,3361,3391)。 我需要提取至少3个元素长的所有列表,并将所有这些列表存储在1个包装器列表中。 目前我有这个代码,但我不确定它是否有效(2个循环):
chain <- list()
list_of_chains <- list()
for (t in 1:nrow(identical_sessions_df)){
# init
chain <- list(identical_sessions_df[t,1],identical_sessions_df[t,2])
while(inidentical_sessions_df[t,2] != identical_sessions_df[t+1,1]){
if (identical_sessions_df[t,2] == identical_sessions_df[t+1,1]){
chain[[length(chain)+1]] <- identical_sessions_df[t+1,2]
}
} else{
list_of_chains <- c(list_of_chains,c(chain))
chain <- NA
}
我对R很新,很抱歉,如果这个问题很简单并且感谢任何想法
答案 0 :(得分:0)
如果您的data.frame
很小,您可以坚持使用循环,否则您可以使用igraph
包并构建会话图并获取图表的连接组件,即:
library(igraph)
# create a graph, edges are current_session --> next_session
g <- graph.data.frame(identical_sessions_df,directed=T)
# plot the graph if you want to visualize it...
#plot(g,vertex.size=25)
# decompose the graph in the connected components
sg <- decompose.graph(g,mode="weak")
# if you want to plot the sub-graphs...
#for(i in 1:length(sg)){
# plot(sg [[i]],vertex.size=25)
#}
# create the chains list
list_of_chains <-
lapply(sg,function(subgr){
return(V(subgr)$name[topological.sort(subgr)])
})
# remove the sub-chains having <= 2 elements
list_of_chains <- list_of_chains[sapply(list_of_chains,function(x){length(x) > 2})]
# Result:
#> list_of_chains
#[[1]]
#[1] "3181" "3241" "3271" "3361" "3391"
如果你绘制第一张图,这就是你得到的: