我正在R中工作。我目前有一个长列中的数据,我需要将其解析为不同的列。
当前格式(所有数据均显示在名为var1的一列中)
var1
585 00:40:01.530 --> 00:40:03.480
586 Alex High School: Yeah. Again, Megan.
587 00:40:05.970 --> 00:40:06.330
588 Alex High Five: Megan.
589 00:40:08.190 --> 00:40:11.520
590 Charlie High School: Know how did with code Daniel go first.
591 00:40:12.600 --> 00:40:12.810
592 Charlie High School: But
所需格式
585 00:40:01.530 --> 00:40:03.480 Alex High School: Yeah. Again, Megan.
587 00:40:05.970 --> 00:40:06.330 Alex High Five: Megan.
589 00:40:08.190 --> 00:40:11.520 Charlie High School: Know how did with code Daniel go first.
591 00:40:12.600 --> 00:40:12.810 Charlie High School: But
答案 0 :(得分:0)
如果我们假设每个时间戳都有一个文本,那么我们可以这样做:
data.frame(matrix(df$var1, nrow(df)/2, byrow=TRUE))
X1 X2
1 00:40:01.530 --> 00:40:03.480 Alex High School: Yeah. Again, Megan.
2 00:40:05.970 --> 00:40:06.330 Alex High Five: Megan.
3 00:40:08.190 --> 00:40:11.520 Charlie High School: Know how did with code Daniel go first.
4 00:40:12.600 --> 00:40:12.810 Charlie High School: But
如果不是这种情况,那么您将不得不重塑数据:
reshape(transform(df, id = id <-cumsum(grepl("-->",df$var1)), time= ave(id, id, FUN = seq)), v.names="var1", dir="wide")
id var1.1 var1.2
1 1 00:40:01.530 --> 00:40:03.480 Alex High School: Yeah. Again, Megan.
3 2 00:40:05.970 --> 00:40:06.330 Alex High Five: Megan.
5 3 00:40:08.190 --> 00:40:11.520 Charlie High School: Know how did with code Daniel go first.
7 4 00:40:12.600 --> 00:40:12.810 Charlie High School: But