R-根据singel定界符标准进行拆分,以创建多个列表

时间:2018-06-20 09:34:37

标签: r

早上好

我对R比较陌生,自昨天以来,以下问题就一直困扰着我。

我正在处理以下格式的列表:

 X
                      t          Ph       Ti        Te     Delta
2   2014-02-16 05:00:00    0.000000 19.83400  4.392333 15.441667
3   2014-02-16 10:00:00   24.997867 20.22083  7.637000 12.583833
4   2014-02-16 15:00:00 2349.799467 22.73400  7.735500 14.998500
5   2014-02-16 20:00:00    0.000000 23.66300  4.917167 18.745833
6   2014-02-17 01:00:00    0.000000 21.99467  3.810167 18.184500
7   2014-02-17 06:00:00    0.000000 20.35433  4.665167 15.689167
8   2014-02-17 11:00:00   15.907733 20.18267  8.206167 11.976500
9   2014-02-17 16:00:00 2542.964800 22.33983  8.385833 13.954000

...

我想做的是基于“ Delta”值划分此列表,以这种方式将delta值超过15的行与其余行分开,如下所示:

 X2
$`1`
                    t       Ph       Ti       Te    Delta
2 2014-02-16 05:00:00  0.00000 19.83400 4.392333 15.44167

$`2`
                    t       Ph     Ti     Te   Delta
3 2014-02-16 10:00:00 24.99787 20.22083 7.637000 12.58383
4 2014-02-16 15:00:00 2349.799 22.734 7.7355 14.9985

$`3`
                    t       Ph       Ti       Te    Delta
5 2014-02-16 20:00:00  0.00000 23.66300 4.917167 18.74583
6 2014-02-17 01:00:00  0.00000 21.99467 3.810167 18.18450
7 2014-02-17 06:00:00  0.00000 20.35433 4.665167 15.68917

$`4`
                    t       Ph       Ti       Te  Delta
8 2014-02-17 11:00:00 15.90773 20.18267 8.206167 11.97650
9 2014-02-17 16:00:00 2542.965 22.33983 8.385833 13.954

$`5`
                     t       Ph      Ti       Te    Delta
10 2014-02-17 21:00:00 15.90773 23.0335 7.994833 15.03867

我在互联网上找到了这段代码:

X2<-split(X, cumsum(DeltaT < 15)))

给出以下结果:

 X2
$`0`
                    t Ph     Ti       Te    Delta
2 2014-02-16 05:00:00  0 19.834 4.392333 15.44167

$`1`
                    t       Ph       Ti    Te    Delta
3 2014-02-16 10:00:00 24.99787 20.22083 7.637 12.58383

$`2`
                    t       Ph       Ti       Te    Delta
4 2014-02-16 15:00:00 2349.799 22.73400 7.735500 14.99850
5 2014-02-16 20:00:00    0.000 23.66300 4.917167 18.74583
6 2014-02-17 01:00:00    0.000 21.99467 3.810167 18.18450
7 2014-02-17 06:00:00    0.000 20.35433 4.665167 15.68917

$`3`
                    t       Ph       Ti       Te   Delta
8 2014-02-17 11:00:00 15.90773 20.18267 8.206167 11.9765

$`4`
                     t         Ph       Ti       Te    Delta
9  2014-02-17 16:00:00 2542.96480 22.33983 8.385833 13.95400
10 2014-02-17 21:00:00   15.90773 23.03350 7.994833 15.03867

如您所见,使用cumsum意味着我总是将Delta值小于15的最后一行包含在大于15的块中。是否有其他方法可以实现此结果?欢迎提供帮助。

2 个答案:

答案 0 :(得分:1)

尝试一下,我正在使用data.table :: rleid函数(运行长度编码),这里df是您的数据框:

df$rle <- data.table::rleid(df$Delta < 15)
split(df, df$rle)

输出:

$`1`
  sn                t Ph     Ti       Te    Delta rle
1  2 16-02-2014 05:00  0 19.834 4.392333 15.44167   1

$`2`
  sn                t         Ph       Ti     Te    Delta rle
2  3 16-02-2014 10:00   24.99787 20.22083 7.6370 12.58383   2
3  4 16-02-2014 15:00 2349.79947 22.73400 7.7355 14.99850   2

$`3`
  sn                t Ph       Ti       Te    Delta rle
4  5 16-02-2014 20:00  0 23.66300 4.917167 18.74583   3
5  6 17-02-2014 01:00  0 21.99467 3.810167 18.18450   3
6  7 17-02-2014 06:00  0 20.35433 4.665167 15.68917   3

$`4`
  sn                t         Ph       Ti       Te   Delta rle
7  8 17-02-2014 11:00   15.90773 20.18267 8.206167 11.9765   4
8  9 17-02-2014 16:00 2542.96480 22.33983 8.385833 13.9540   4

$`5`
  sn                t       Ph      Ti       Te    Delta rle
9 10 17-02-2014 21:00 15.90773 23.0335 7.994833 15.03867   5

答案 1 :(得分:1)

base R中,我们可以使用rle

split(X, with(rle(X$Delta > 15), rep(seq_along(values), lengths)))
#$`1`
                    t Ph     Ti       Te    Delta
#2 2014-02-16 05:00:00  0 19.834 4.392333 15.44167

#$`2`
                    t         Ph       Ti     Te    Delta
#3 2014-02-16 10:00:00   24.99787 20.22083 7.6370 12.58383
#4 2014-02-16 15:00:00 2349.79947 22.73400 7.7355 14.99850

#$`3`
#                    t Ph       Ti       Te    Delta
#5 2014-02-16 20:00:00  0 23.66300 4.917167 18.74583
#6 2014-02-17 01:00:00  0 21.99467 3.810167 18.18450
#7 2014-02-17 06:00:00  0 20.35433 4.665167 15.68917

#$`4`
#                    t         Ph       Ti       Te   Delta
#8 2014-02-17 11:00:00   15.90773 20.18267 8.206167 11.9765
#9 2014-02-17 16:00:00 2542.96480 22.33983 8.385833 13.9540

#$`5`
                     t       Ph      Ti       Te    Delta
#10 2014-02-17 21:00:00 15.90773 23.0335 7.994833 15.03867

数据

X <- structure(list(t = c("2014-02-16 05:00:00", "2014-02-16 10:00:00", 
"2014-02-16 15:00:00", "2014-02-16 20:00:00", "2014-02-17 01:00:00", 
"2014-02-17 06:00:00", "2014-02-17 11:00:00", "2014-02-17 16:00:00", 
"2014-02-17 21:00:00"), Ph = c(0, 24.997867, 2349.799467, 0, 
0, 0, 15.907733, 2542.9648, 15.90773), Ti = c(19.834, 20.22083, 
22.734, 23.663, 21.99467, 20.35433, 20.18267, 22.33983, 23.0335
), Te = c(4.392333, 7.637, 7.7355, 4.917167, 3.810167, 4.665167, 
8.206167, 8.385833, 7.994833), Delta = c(15.441667, 12.583833, 
14.9985, 18.745833, 18.1845, 15.689167, 11.9765, 13.954, 15.03867
)), .Names = c("t", "Ph", "Ti", "Te", "Delta"),
 class = "data.frame", row.names = c("2", 
"3", "4", "5", "6", "7", "8", "9", "10"))