根据特定的组ID拆分数据帧(在R中)

时间:2019-11-25 15:30:24

标签: r

我有一个数据框,我想在该数据框的基础上按一个组进行拆分,并希望对其进行拆分,因此我可以封装从Valley 1到Valley 2,Valley 2到Valley 3等的所有内容。

Time       Peaks  ID

1   0.00    Data    1
2   0.36    Data    2
3   0.75    Valley  1
4   1.14    Peak    1
5   1.54    Data    3
6   1.93    Data    4   
7   2.32    Valley  2   
8   2.72    Peak    2   
9   3.12    Valley  3   

Desired  output:

df1

3   0.75    Valley  1
4   1.14    Peak    1
5   1.54    Data    3
6   1.93    Data    4   
7   2.32    Valley  2   

df2
7   2.32    Valley  2   
8   2.72    Peak    2   
9   3.12    Valley  3       

可直接在R中使用的格式数据:

dat <- structure(list(Row = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 
13, 14, 15, 16), Time = c(0, 0.36, 0.75, 1.14, 1.54, 1.93, 2.32, 
2.72, 3.12, 3.51, 3.9, 4.3, 4.69, 5.08, 5.47, 5.87), Value = c(455, 
456, 456, 459, 456, 456, 455, 458, 455, 458, 455, 456, 461, 458, 
458, 459), Peaks = c("Data", "Data", "Valley", "Peak", "Data", 
"Data", "Valley", "Peak", "Valley", "Peak", "Valley", "Data", 
"Peak", "Data", "Valley", "Data"), Peak_id = c(1, 2, 1, 1, 3, 
4, 2, 2, 3, 3, 4, 5, 4, 6, 5, 7)), row.names = c(NA, -16L), class = c("tbl_df", 
"tbl", "data.frame"))

2 个答案:

答案 0 :(得分:2)

尝试以下代码,希望对您有所帮助:

idx <- with(df,which(Peaks == "Valley"))
mapply(function(k1,k2) df[k1:k2,], idx[-length(idx)],idx[-1],SIMPLIFY = F)

收益:

[[1]]
  Time Value  Peaks Peak_id
3 0.75   456 Valley       1
4 1.14   459   Peak       1
5 1.54   456   Data       3
6 1.93   456   Data       4
7 2.32   455 Valley       2

[[2]]
  Time Value  Peaks Peak_id
7 2.32   455 Valley       2
8 2.72   458   Peak       2
9 3.12   455 Valley       3

[[3]]
   Time Value  Peaks Peak_id
9  3.12   455 Valley       3
10 3.51   458   Peak       3
11 3.90   455 Valley       4

[[4]]
   Time Value  Peaks Peak_id
11 3.90   455 Valley       4
12 4.30   456   Data       5
13 4.69   461   Peak       4
14 5.08   458   Data       6
15 5.47   458 Valley       5

答案 1 :(得分:1)

结合使用cumsumsplit函数非常简单:

bsp = as.data.frame(list(Time = c(0, 0.36, 0.75, 1.14, 1.54,  1.93, 2.32, 2.72, 3.12, 3.51, 3.9, 4.3, 4.69, 5.08, 5.47, 5.87), 
                     Value = c(455L, 456L, 456L, 459L, 456L, 456L, 455L, 458L,  455L, 458L, 455L, 456L, 461L, 458L, 458L, 459L), 
                     Peaks = c("Data",  "Data", "Valley", "Peak", "Data", "Data", "Valley", "Peak", "Valley", 
                                "Peak", "Valley", "Data", "Peak", "Data", "Valley", "Data"), 
                     Peak_id = c(1L, 2L, 1L, 1L, 3L, 4L, 2L, 2L, 3L, 3L, 4L, 5L, 4L, 6L, 5L, 7L)))

bsp$group_id = cumsum(bsp$Peaks == 'Valley')
split(bsp, by = "group_id", keep.by = FALSE)