我正在尝试编写一个循环函数来对下面的数据进行子集化:
DEPTH A B C D E F
4700 8.75 9.313 0.12 0.138 89.164 72.613
4700.5 8.75 9.264 0.117 0.135 89.266 72.784
4701 8.75 9.376 0.112 0.132 89.52 68.443
4701.5 8.75 9.485 0.11 0.122 89.088 64.839
4702 8.75 9.564 0.116 0.108 89.377 64.388
4702.5 8.75 9.572 0.121 0.098 88.93 66.931
4703 8.75 9.524 0.122 0.093 89.651 70.906
4703.5 8.75 9.395 0.124 0.091 90.486 75.106
4704 8.75 9.245 0.123 0.089 90.598 77.443
4704.5 8.75 9.298 0.124 0.087 91.251 78.93
4705 8.75 9.361 0.125 0.088 90.319 77.159
4705.5 8.75 9.454 0.123 0.088 88.176 75.999
4706 8.75 9.448 0.124 0.088 86.129 78.843
4706.5 8.75 9.359 0.124 0.096 85.581 77.067
4707 8.75 9.305 0.119 0.12 85.082 73.191
4707.5 8.75 9.16 0.113 0.16 85.738 78.425
4708 8.75 9.036 0.097 0.208 86.114 91.491
4708.5 8.75 9.126 0.089 0.237 89.779 97.706
4709 8.75 9.111 0.094 0.224 92.429 91.557
4709.5 8.75 9.119 0.106 0.195 91.663 85.642
4710 8.75 9.234 0.143 0.185 91.881 83.705
4710.5 8.75 9.468 0.172 0.172 92.526 82.094
4711 8.75 9.59 0.187 0.139 94.544 85.973
4711.5 8.75 9.364 0.304 0.106 97.261 88.345
4712 8.75 9.145 0.458 0.089 98.726 78.622
4712.5 8.75 8.97 0.463 0.071 99.372 74.403
4713 8.75 8.985 0.384 0.064 99.343 82.743
4713.5 8.75 9.021 0.321 0.098 98.377 89.484
4714 8.75 9.148 0.247 0.133 95.209 93.148
4714.5 8.75 9.352 0.181 0.129 87.194 99.743
4715 8.75 9.427 0.147 0.104 83.613 109.798
成:
subset1
DEPTH A B C D E F
4700 8.75 9.313 0.12 0.138 89.164 72.613
4700.5 8.75 9.264 0.117 0.135 89.266 72.784
4701 8.75 9.376 0.112 0.132 89.52 68.443
4701.5 8.75 9.485 0.11 0.122 89.088 64.839
4702 8.75 9.564 0.116 0.108 89.377 64.388
4702.5 8.75 9.572 0.121 0.098 88.93 66.931
4703 8.75 9.524 0.122 0.093 89.651 70.906
subset2
DEPTH A B C D E F
4703 8.75 9.524 0.122 0.093 89.651 70.906
4703.5 8.75 9.395 0.124 0.091 90.486 75.106
4704 8.75 9.245 0.123 0.089 90.598 77.443
4704.5 8.75 9.298 0.124 0.087 91.251 78.93
4705 8.75 9.361 0.125 0.088 90.319 77.159
4705.5 8.75 9.454 0.123 0.088 88.176 75.999
4706 8.75 9.448 0.124 0.088 86.129 78.843
subset3
DEPTH A B C D E F
4706 8.75 9.448 0.124 0.088 86.129 78.843
4706.5 8.75 9.359 0.124 0.096 85.581 77.067
4707 8.75 9.305 0.119 0.12 85.082 73.191
4707.5 8.75 9.16 0.113 0.16 85.738 78.425
4708 8.75 9.036 0.097 0.208 86.114 91.491
4708.5 8.75 9.126 0.089 0.237 89.779 97.706
4709 8.75 9.111 0.094 0.224 92.429 91.557
subset4
DEPTH A B C D E F
4709 8.75 9.111 0.094 0.224 92.429 91.557
4709.5 8.75 9.119 0.106 0.195 91.663 85.642
4710 8.75 9.234 0.143 0.185 91.881 83.705
4710.5 8.75 9.468 0.172 0.172 92.526 82.094
4711 8.75 9.59 0.187 0.139 94.544 85.973
4711.5 8.75 9.364 0.304 0.106 97.261 88.345
4712 8.75 9.145 0.458 0.089 98.726 78.622
subset5
DEPTH A B C D E F
4712 8.75 9.145 0.458 0.089 98.726 78.622
4712.5 8.75 8.97 0.463 0.071 99.372 74.403
4713 8.75 8.985 0.384 0.064 99.343 82.743
4713.5 8.75 9.021 0.321 0.098 98.377 89.484
4714 8.75 9.148 0.247 0.133 95.209 93.148
4714.5 8.75 9.352 0.181 0.129 87.194 99.743
4715 8.75 9.427 0.147 0.104 83.613 109.798
有人可以帮助我吗?到目前为止,我没有运气找到方法。 我需要每隔3英尺间隔对数据进行子集化。
答案 0 :(得分:0)
cut
函数将连续向量切割为由您传递的断点确定的等级。在这种情况下,您只需要一个从seq
到min(df$DEPTH)
的{{1}} uence,然后将其传递给max(df$DEPTH)
:
cut
现在我们已经有了一个用于划分数据的列,因此我们可以通过df$breaks <- cut(df$DEPTH, seq(min(df$DEPTH), max(df$DEPTH), by = 3), include.lowest = TRUE)
head(df)
# DEPTH A B C D E F breaks
# 1 4700.0 8.75 9.313 0.120 0.138 89.164 72.613 [4700,4703]
# 2 4700.5 8.75 9.264 0.117 0.135 89.266 72.784 [4700,4703]
# 3 4701.0 8.75 9.376 0.112 0.132 89.520 68.443 [4700,4703]
# 4 4701.5 8.75 9.485 0.110 0.122 89.088 64.839 [4700,4703]
# 5 4702.0 8.75 9.564 0.116 0.108 89.377 64.388 [4700,4703]
# 6 4702.5 8.75 9.572 0.121 0.098 88.930 66.931 [4700,4703]
内的子集将其拆分为单独的data.frames,同时使用lapply
删除添加的列,因为这是最后一栏补充:
-ncol(df)
或者,您可以通过切割lapply(levels(df$breaks), function(x){df[df$breaks == x, -ncol(df)]})
# [[1]]
# DEPTH A B C D E F
# 1 4700.0 8.75 9.313 0.120 0.138 89.164 72.613
# 2 4700.5 8.75 9.264 0.117 0.135 89.266 72.784
# 3 4701.0 8.75 9.376 0.112 0.132 89.520 68.443
# 4 4701.5 8.75 9.485 0.110 0.122 89.088 64.839
# 5 4702.0 8.75 9.564 0.116 0.108 89.377 64.388
# 6 4702.5 8.75 9.572 0.121 0.098 88.930 66.931
# 7 4703.0 8.75 9.524 0.122 0.093 89.651 70.906
#
# [[2]]
# DEPTH A B C D E F
# 8 4703.5 8.75 9.395 0.124 0.091 90.486 75.106
# 9 4704.0 8.75 9.245 0.123 0.089 90.598 77.443
# 10 4704.5 8.75 9.298 0.124 0.087 91.251 78.930
# 11 4705.0 8.75 9.361 0.125 0.088 90.319 77.159
# 12 4705.5 8.75 9.454 0.123 0.088 88.176 75.999
# 13 4706.0 8.75 9.448 0.124 0.088 86.129 78.843
# .....
内部将流程全部包装为一个,但您将无法简单地删除新列。
split
答案 1 :(得分:0)
一些data.table解决方案:
<强> foverlaps 强>
我们可以在foverlaps
library(data.table)
将您的data.frame
转换为data.table
setDT(dt) ## put your df in here
创建一个查找深度的表格&#39;给我们间隔时间
dt_depths <- data.table(depths_min = seq(min(dt$DEPTH), max(dt$DEPTH), by=3),
depths_max = seq(min(dt$DEPTH) + 3, max(dt$DEPTH) + 3, by=3))
# depths_min depths_max
# 1: 4700 4703
# 2: 4703 4706
# 3: 4706 4709
# 4: 4709 4712
# 5: 4712 4715
# 6: 4715 4718
设置foverlaps
加入
dt <- dt[, .(A, B, C, D, E, F, DEPTH)] ## re-order for foverlap
dt[, DEPTH_copy := DEPTH]
setkey(dt_depths, depths_min, depths_max)
setkey(dt, DEPTH, DEPTH_copy)
## do the join
dt_join <- foverlaps(dt,
dt_depths,
type="any") ## any - to allow those on the 'borders' into both groups
我们可以添加&#39;组/子集&#39;如果你愿意:
dt_join[, subset := rleid(depths_min, depths_max)]
# depths_min depths_max A B C D E F DEPTH DEPTH_copy subset
# 1: 4700 4703 8.75 9.313 0.120 0.138 89.164 72.613 4700.0 4700.0 1
# 2: 4700 4703 8.75 9.264 0.117 0.135 89.266 72.784 4700.5 4700.5 1
# 3: 4700 4703 8.75 9.376 0.112 0.132 89.520 68.443 4701.0 4701.0 1
# 4: 4700 4703 8.75 9.485 0.110 0.122 89.088 64.839 4701.5 4701.5 1
# 5: 4700 4703 8.75 9.564 0.116 0.108 89.377 64.388 4702.0 4702.0 1
# 6: 4700 4703 8.75 9.572 0.121 0.098 88.930 66.931 4702.5 4702.5 1
# 7: 4700 4703 8.75 9.524 0.122 0.093 89.651 70.906 4703.0 4703.0 1
# 8: 4703 4706 8.75 9.524 0.122 0.093 89.651 70.906 4703.0 4703.0 2
# 9: 4703 4706 8.75 9.395 0.124 0.091 90.486 75.106 4703.5 4703.5 2
# 10: 4703 4706 8.75 9.245 0.123 0.089 90.598 77.443 4704.0 4704.0 2
...
split
如果你想要它作为列表
split(dt_join, dt_join$subset)
<强> EACHI 强>
## create a 'key_col' column to join on
dt_depths[, key_col := 1][ dt[, key_col := 1] ,
{ idx = depths_min <= i.DEPTH & i.DEPTH <= depths_max
.(DEPTH = i.DEPTH,
depths_min = depths_min[idx],
depths_max = depths_max[idx],
A = i.A,
B = i.B,
C = i.C,
D = i.D,
E = i.E,
F = i.F)
},
on = "key_col",
by=.EACHI]
然后可以使用foverlaps
解决方案中的相同方法识别子集