如何分隔数据框中的列表列表?

时间:2018-10-22 17:38:25

标签: r list purrr

我正在分析带有行程的数据框。数据的格式如下:

 tripnumber stop       
<int> <list>     
1 <list [34]>
2 <list [34]>
3 <list [33]>
4 <list [20]>
5 <list [17]>
6 <list [17]>

每个行程号都连接到一定数量的站点,例如,行程1有34个站点。

一个重要的注意事项是,停靠站列表并非仅是站点列表,而是将其格式化为包含站点+信息的另一个列表(我们称这些站点列表),其结构如下:

列表(站点=“ ams”,Arival_time =“ 0135”,Departure_time =“ 0138”,索引=“ 1”)

我希望在旅行编号之后的第一列中将电台列表的列表不列出,将其作为第一个电台列表,在第二列中将其作为第二个电台列表,以此类推,如下所示:

 tripnumber stop1 stop2 stop3 stop4 stop5 .... 
<int> <list>     
1 <list [4]> <list [4]> <list [4]> <list [4]> <list [4]> ....
2 <list [4]> <list [4]> <list [4]> <list [4]> <list [4]> ....
3 <list [4]> <list [4]> <list [4]> <list [4]> <list [4]> ....
4 <list [4]> <list [4]> <list [4]> <list [4]> <list [4]> ....
5 <list [4]> <list [4]> <list [4]> <list [4]> <list [4]> ....
6 <list [4]> <list [4]> <list [4]> <list [4]> <list [4]> ....

我尝试使用purrr库对此进行格式化。但是,我对这个程序包不太熟悉,困难之处在于我不能在不丢失tripnumber结构或“ stationlist”结构的情况下使它正常工作。

任何提示如何解决这个问题?

编辑:

  • 可以将以下dput(head(traintrips)作为测试文件复制粘贴到R:.txt file
  • 如果停止列多于实际停止,则该单元格应保持为空(“”)

1 个答案:

答案 0 :(得分:0)

通过使用以下代码来取消嵌套并使其重塑结果,从而使其正常工作:

DFnew <- unnest(traintrips, traintrips$stop) 
DFnew$time <- with(DFnew, ave(tripnumber, tripnumber, FUN = seq_along)) # add time column
names(DFnew)[2] <- paste("stop") # to remove the dollar sign from the colname of the unnested data
DFnew <- spread(DFnew, time, stop)

结果:

> dim(DFnew)
[1]  6 35

> head(DFnew[,1:6])
# A tibble: 6 x 6
  tripnumber `1`        `2`        `3`        `4`        `5`       
       <int> <list>     <list>     <list>     <list>     <list>    
1          1 <list [4]> <list [4]> <list [4]> <list [4]> <list [4]>
2          2 <list [4]> <list [4]> <list [4]> <list [4]> <list [4]>
3          3 <list [4]> <list [4]> <list [4]> <list [4]> <list [4]>
4          4 <list [4]> <list [4]> <list [4]> <list [4]> <list [4]>
5          5 <list [4]> <list [4]> <list [4]> <list [4]> <list [4]>
6          6 <list [4]> <list [4]> <list [4]> <list [4]> <list [4]>