将数据重新格式化为R中的多列

时间:2020-08-04 16:40:27

标签: r formatting

我正在R中工作。我目前有一个长列中的数据,我需要将其解析为不同的列。

当前格式(所有数据均显示在名为var1的一列中)

var1
585                               00:40:01.530 --> 00:40:03.480
586                     Alex High School: Yeah. Again, Megan.
587                               00:40:05.970 --> 00:40:06.330
588                                  Alex High Five: Megan.
589                               00:40:08.190 --> 00:40:11.520
590 Charlie High School: Know how did with code Daniel go first.
591                               00:40:12.600 --> 00:40:12.810
592                                     Charlie High School: But

所需格式

585 00:40:01.530 --> 00:40:03.480   Alex High School: Yeah. Again, Megan.
                   
587 00:40:05.970 --> 00:40:06.330 Alex High Five: Megan.
589 00:40:08.190 --> 00:40:11.520 Charlie High School: Know how did with code Daniel go first.
591 00:40:12.600 --> 00:40:12.810 Charlie High School: But

1 个答案:

答案 0 :(得分:0)

如果我们假设每个时间戳都有一个文本,那么我们可以这样做:

data.frame(matrix(df$var1, nrow(df)/2, byrow=TRUE))

                            X1                                                           X2
1 00:40:01.530 --> 00:40:03.480                        Alex High School: Yeah. Again, Megan.
2 00:40:05.970 --> 00:40:06.330                                       Alex High Five: Megan.
3 00:40:08.190 --> 00:40:11.520 Charlie High School: Know how did with code Daniel go first.
4 00:40:12.600 --> 00:40:12.810                                     Charlie High School: But

如果不是这种情况,那么您将不得不重塑数据:

 reshape(transform(df, id = id <-cumsum(grepl("-->",df$var1)), time= ave(id, id, FUN = seq)), v.names="var1", dir="wide")
  id                        var1.1                                                       var1.2
1  1 00:40:01.530 --> 00:40:03.480                        Alex High School: Yeah. Again, Megan.
3  2 00:40:05.970 --> 00:40:06.330                                       Alex High Five: Megan.
5  3 00:40:08.190 --> 00:40:11.520 Charlie High School: Know how did with code Daniel go first.
7  4 00:40:12.600 --> 00:40:12.810                                     Charlie High School: But