R构建数据框行之间的链接列表

时间:2015-02-18 11:00:18

标签: r list dataframe

我有以下数据框:

  current_session next_session
1             811          841
2            1771         2071
3            3181         3241
4            3241         3271
5            3271         3361
6            3361         3391

我需要为每个大于2个元素的会话链构建列表,其中一系列此类元素由“next_session”与下一行的“current_session”之间的链接定义。例如,可以从上述集合中抽取的1个链是(3181,3241,3271,3361,3391)。 我需要提取至少3个元素长的所有列表,并将所有这些列表存储在1个包装器列表中。 目前我有这个代码,但我不确定它是否有效(2个循环):

chain <- list()
list_of_chains <- list()
for (t in 1:nrow(identical_sessions_df)){
  # init
  chain <- list(identical_sessions_df[t,1],identical_sessions_df[t,2])

 while(inidentical_sessions_df[t,2] != identical_sessions_df[t+1,1]){

   if (identical_sessions_df[t,2] == identical_sessions_df[t+1,1]){
     chain[[length(chain)+1]] <- identical_sessions_df[t+1,2]
 } 

  } else{
    list_of_chains <- c(list_of_chains,c(chain))
    chain <- NA
  }

我对R很新,很抱歉,如果这个问题很简单并且感谢任何想法

1 个答案:

答案 0 :(得分:0)

如果您的data.frame很小,您可以坚持使用循环,否则您可以使用igraph包并构建会话图并获取图表的连接组件,即:

library(igraph)

# create a graph, edges are current_session --> next_session
g <- graph.data.frame(identical_sessions_df,directed=T)
# plot the graph if you want to visualize it...
#plot(g,vertex.size=25)

# decompose the graph in the connected components
sg <- decompose.graph(g,mode="weak")

# if you want to plot the sub-graphs...
#for(i in 1:length(sg)){
#  plot(sg [[i]],vertex.size=25)
#}

# create the chains list
list_of_chains <- 
lapply(sg,function(subgr){ 
  return(V(subgr)$name[topological.sort(subgr)])
})

# remove the sub-chains having <= 2 elements
list_of_chains <- list_of_chains[sapply(list_of_chains,function(x){length(x) > 2})]

# Result:
#> list_of_chains
#[[1]]
#[1] "3181" "3241" "3271" "3361" "3391"

如果你绘制第一张图,这就是你得到的:

enter image description here