早上好
我对R比较陌生,自昨天以来,以下问题就一直困扰着我。
我正在处理以下格式的列表:
X
t Ph Ti Te Delta
2 2014-02-16 05:00:00 0.000000 19.83400 4.392333 15.441667
3 2014-02-16 10:00:00 24.997867 20.22083 7.637000 12.583833
4 2014-02-16 15:00:00 2349.799467 22.73400 7.735500 14.998500
5 2014-02-16 20:00:00 0.000000 23.66300 4.917167 18.745833
6 2014-02-17 01:00:00 0.000000 21.99467 3.810167 18.184500
7 2014-02-17 06:00:00 0.000000 20.35433 4.665167 15.689167
8 2014-02-17 11:00:00 15.907733 20.18267 8.206167 11.976500
9 2014-02-17 16:00:00 2542.964800 22.33983 8.385833 13.954000
...
我想做的是基于“ Delta”值划分此列表,以这种方式将delta值超过15的行与其余行分开,如下所示:
X2
$`1`
t Ph Ti Te Delta
2 2014-02-16 05:00:00 0.00000 19.83400 4.392333 15.44167
$`2`
t Ph Ti Te Delta
3 2014-02-16 10:00:00 24.99787 20.22083 7.637000 12.58383
4 2014-02-16 15:00:00 2349.799 22.734 7.7355 14.9985
$`3`
t Ph Ti Te Delta
5 2014-02-16 20:00:00 0.00000 23.66300 4.917167 18.74583
6 2014-02-17 01:00:00 0.00000 21.99467 3.810167 18.18450
7 2014-02-17 06:00:00 0.00000 20.35433 4.665167 15.68917
$`4`
t Ph Ti Te Delta
8 2014-02-17 11:00:00 15.90773 20.18267 8.206167 11.97650
9 2014-02-17 16:00:00 2542.965 22.33983 8.385833 13.954
$`5`
t Ph Ti Te Delta
10 2014-02-17 21:00:00 15.90773 23.0335 7.994833 15.03867
我在互联网上找到了这段代码:
X2<-split(X, cumsum(DeltaT < 15)))
给出以下结果:
X2
$`0`
t Ph Ti Te Delta
2 2014-02-16 05:00:00 0 19.834 4.392333 15.44167
$`1`
t Ph Ti Te Delta
3 2014-02-16 10:00:00 24.99787 20.22083 7.637 12.58383
$`2`
t Ph Ti Te Delta
4 2014-02-16 15:00:00 2349.799 22.73400 7.735500 14.99850
5 2014-02-16 20:00:00 0.000 23.66300 4.917167 18.74583
6 2014-02-17 01:00:00 0.000 21.99467 3.810167 18.18450
7 2014-02-17 06:00:00 0.000 20.35433 4.665167 15.68917
$`3`
t Ph Ti Te Delta
8 2014-02-17 11:00:00 15.90773 20.18267 8.206167 11.9765
$`4`
t Ph Ti Te Delta
9 2014-02-17 16:00:00 2542.96480 22.33983 8.385833 13.95400
10 2014-02-17 21:00:00 15.90773 23.03350 7.994833 15.03867
如您所见,使用cumsum意味着我总是将Delta值小于15的最后一行包含在大于15的块中。是否有其他方法可以实现此结果?欢迎提供帮助。
答案 0 :(得分:1)
尝试一下,我正在使用data.table :: rleid函数(运行长度编码),这里df是您的数据框:
df$rle <- data.table::rleid(df$Delta < 15)
split(df, df$rle)
输出:
$`1`
sn t Ph Ti Te Delta rle
1 2 16-02-2014 05:00 0 19.834 4.392333 15.44167 1
$`2`
sn t Ph Ti Te Delta rle
2 3 16-02-2014 10:00 24.99787 20.22083 7.6370 12.58383 2
3 4 16-02-2014 15:00 2349.79947 22.73400 7.7355 14.99850 2
$`3`
sn t Ph Ti Te Delta rle
4 5 16-02-2014 20:00 0 23.66300 4.917167 18.74583 3
5 6 17-02-2014 01:00 0 21.99467 3.810167 18.18450 3
6 7 17-02-2014 06:00 0 20.35433 4.665167 15.68917 3
$`4`
sn t Ph Ti Te Delta rle
7 8 17-02-2014 11:00 15.90773 20.18267 8.206167 11.9765 4
8 9 17-02-2014 16:00 2542.96480 22.33983 8.385833 13.9540 4
$`5`
sn t Ph Ti Te Delta rle
9 10 17-02-2014 21:00 15.90773 23.0335 7.994833 15.03867 5
答案 1 :(得分:1)
在base R
中,我们可以使用rle
split(X, with(rle(X$Delta > 15), rep(seq_along(values), lengths)))
#$`1`
t Ph Ti Te Delta
#2 2014-02-16 05:00:00 0 19.834 4.392333 15.44167
#$`2`
t Ph Ti Te Delta
#3 2014-02-16 10:00:00 24.99787 20.22083 7.6370 12.58383
#4 2014-02-16 15:00:00 2349.79947 22.73400 7.7355 14.99850
#$`3`
# t Ph Ti Te Delta
#5 2014-02-16 20:00:00 0 23.66300 4.917167 18.74583
#6 2014-02-17 01:00:00 0 21.99467 3.810167 18.18450
#7 2014-02-17 06:00:00 0 20.35433 4.665167 15.68917
#$`4`
# t Ph Ti Te Delta
#8 2014-02-17 11:00:00 15.90773 20.18267 8.206167 11.9765
#9 2014-02-17 16:00:00 2542.96480 22.33983 8.385833 13.9540
#$`5`
t Ph Ti Te Delta
#10 2014-02-17 21:00:00 15.90773 23.0335 7.994833 15.03867
X <- structure(list(t = c("2014-02-16 05:00:00", "2014-02-16 10:00:00",
"2014-02-16 15:00:00", "2014-02-16 20:00:00", "2014-02-17 01:00:00",
"2014-02-17 06:00:00", "2014-02-17 11:00:00", "2014-02-17 16:00:00",
"2014-02-17 21:00:00"), Ph = c(0, 24.997867, 2349.799467, 0,
0, 0, 15.907733, 2542.9648, 15.90773), Ti = c(19.834, 20.22083,
22.734, 23.663, 21.99467, 20.35433, 20.18267, 22.33983, 23.0335
), Te = c(4.392333, 7.637, 7.7355, 4.917167, 3.810167, 4.665167,
8.206167, 8.385833, 7.994833), Delta = c(15.441667, 12.583833,
14.9985, 18.745833, 18.1845, 15.689167, 11.9765, 13.954, 15.03867
)), .Names = c("t", "Ph", "Ti", "Te", "Delta"),
class = "data.frame", row.names = c("2",
"3", "4", "5", "6", "7", "8", "9", "10"))