Question

我正在分析带有行程的数据框。数据的格式如下：

 tripnumber stop       
<int> <list>     
1 <list [34]>
2 <list [34]>
3 <list [33]>
4 <list [20]>
5 <list [17]>
6 <list [17]>

每个行程号都连接到一定数量的站点，例如，行程1有34个站点。

一个重要的注意事项是，停靠站列表并非仅是站点列表，而是将其格式化为包含站点+信息的另一个列表（我们称这些站点列表），其结构如下：

列表（站点=“ ams”，Arival_time =“ 0135”，Departure_time =“ 0138”，索引=“ 1”）

我希望在旅行编号之后的第一列中将电台列表的列表不列出，将其作为第一个电台列表，在第二列中将其作为第二个电台列表，以此类推，如下所示：

 tripnumber stop1 stop2 stop3 stop4 stop5 .... 
<int> <list>     
1 <list [4]> <list [4]> <list [4]> <list [4]> <list [4]> ....
2 <list [4]> <list [4]> <list [4]> <list [4]> <list [4]> ....
3 <list [4]> <list [4]> <list [4]> <list [4]> <list [4]> ....
4 <list [4]> <list [4]> <list [4]> <list [4]> <list [4]> ....
5 <list [4]> <list [4]> <list [4]> <list [4]> <list [4]> ....
6 <list [4]> <list [4]> <list [4]> <list [4]> <list [4]> ....

我尝试使用purrr库对此进行格式化。但是，我对这个程序包不太熟悉，困难之处在于我不能在不丢失tripnumber结构或“ stationlist”结构的情况下使它正常工作。

任何提示如何解决这个问题？

编辑：

可以将以下dput(head(traintrips)作为测试文件复制粘贴到R：.txt file
如果停止列多于实际停止，则该单元格应保持为空（“”）

Answer 1

通过使用以下代码来取消嵌套并使其重塑结果，从而使其正常工作：

DFnew <- unnest(traintrips, traintrips$stop) 
DFnew$time <- with(DFnew, ave(tripnumber, tripnumber, FUN = seq_along)) # add time column
names(DFnew)[2] <- paste("stop") # to remove the dollar sign from the colname of the unnested data
DFnew <- spread(DFnew, time, stop)

结果：

> dim(DFnew)
[1]  6 35

> head(DFnew[,1:6])
# A tibble: 6 x 6
  tripnumber `1`        `2`        `3`        `4`        `5`       
       <int> <list>     <list>     <list>     <list>     <list>    
1          1 <list [4]> <list [4]> <list [4]> <list [4]> <list [4]>
2          2 <list [4]> <list [4]> <list [4]> <list [4]> <list [4]>
3          3 <list [4]> <list [4]> <list [4]> <list [4]> <list [4]>
4          4 <list [4]> <list [4]> <list [4]> <list [4]> <list [4]>
5          5 <list [4]> <list [4]> <list [4]> <list [4]> <list [4]>
6          6 <list [4]> <list [4]> <list [4]> <list [4]> <list [4]>

如何分隔数据框中的列表列表？

1 个答案: