我正在分析带有行程的数据框。数据的格式如下:
tripnumber stop
<int> <list>
1 <list [34]>
2 <list [34]>
3 <list [33]>
4 <list [20]>
5 <list [17]>
6 <list [17]>
每个行程号都连接到一定数量的站点,例如,行程1有34个站点。
一个重要的注意事项是,停靠站列表并非仅是站点列表,而是将其格式化为包含站点+信息的另一个列表(我们称这些站点列表),其结构如下:
列表(站点=“ ams”,Arival_time =“ 0135”,Departure_time =“ 0138”,索引=“ 1”)
我希望在旅行编号之后的第一列中将电台列表的列表不列出,将其作为第一个电台列表,在第二列中将其作为第二个电台列表,以此类推,如下所示:
tripnumber stop1 stop2 stop3 stop4 stop5 ....
<int> <list>
1 <list [4]> <list [4]> <list [4]> <list [4]> <list [4]> ....
2 <list [4]> <list [4]> <list [4]> <list [4]> <list [4]> ....
3 <list [4]> <list [4]> <list [4]> <list [4]> <list [4]> ....
4 <list [4]> <list [4]> <list [4]> <list [4]> <list [4]> ....
5 <list [4]> <list [4]> <list [4]> <list [4]> <list [4]> ....
6 <list [4]> <list [4]> <list [4]> <list [4]> <list [4]> ....
我尝试使用purrr
库对此进行格式化。但是,我对这个程序包不太熟悉,困难之处在于我不能在不丢失tripnumber结构或“ stationlist”结构的情况下使它正常工作。
任何提示如何解决这个问题?
编辑:
dput(head(traintrips)
作为测试文件复制粘贴到R:.txt file 答案 0 :(得分:0)
通过使用以下代码来取消嵌套并使其重塑结果,从而使其正常工作:
DFnew <- unnest(traintrips, traintrips$stop)
DFnew$time <- with(DFnew, ave(tripnumber, tripnumber, FUN = seq_along)) # add time column
names(DFnew)[2] <- paste("stop") # to remove the dollar sign from the colname of the unnested data
DFnew <- spread(DFnew, time, stop)
结果:
> dim(DFnew)
[1] 6 35
> head(DFnew[,1:6])
# A tibble: 6 x 6
tripnumber `1` `2` `3` `4` `5`
<int> <list> <list> <list> <list> <list>
1 1 <list [4]> <list [4]> <list [4]> <list [4]> <list [4]>
2 2 <list [4]> <list [4]> <list [4]> <list [4]> <list [4]>
3 3 <list [4]> <list [4]> <list [4]> <list [4]> <list [4]>
4 4 <list [4]> <list [4]> <list [4]> <list [4]> <list [4]>
5 5 <list [4]> <list [4]> <list [4]> <list [4]> <list [4]>
6 6 <list [4]> <list [4]> <list [4]> <list [4]> <list [4]>