我有data.frame()
分段和值。
如果值足够接近,我想合并段。
可重复的例子 -
set.seed(4)
df <- data.frame(start = seq(from = 1, to = 91, by = 10),
end = seq(from = 10, to = 100, by = 10),
value = rnorm(10))
print(df)
start end value
1 1 10 0.2167549
2 11 20 -0.5424926
3 21 30 0.8911446
4 31 40 0.5959806
5 41 50 1.6356180
6 51 60 0.6892754
7 61 70 -1.2812466
8 71 80 -0.2131445
9 81 90 1.8965399
10 91 100 1.7768632
后续细分之间的区别是
for(i in 1:9) print(abs(df$value[i] - df$value[i+1]))
[1] 0.7592474
[1] 1.433637
[1] 0.2951641
[1] 1.039637
[1] 0.9463426
[1] 1.970522
[1] 1.068102
[1] 2.109684
[1] 0.1196767
假设我想合并diff小于1的段,值应该是段值的平均值。 结果应该是这样的 -
start end value
1 1 20 -0.1628689
2 21 40 0.7435626
3 41 60 1.1624467
4 61 70 -1.2812466
5 71 80 -0.2131445
6 81 10 1.8367015
如果一个接一个地有3个片段,我想将其中的三个合并为一个。
有没有简单的方法呢?
答案 0 :(得分:1)
这是一个解决方案..
变量weight
表示合并段的数量。
set.seed(4)
df <- data.frame(start = seq(from = 1, to = 91, by = 10),
end = seq(from = 10, to = 100, by = 10),
value = rnorm(10))
df$weight <- 1 #initialize the number of merge segment
for (i in 1:(nrow(df)-1)){
if (abs(df$value[i] - df$value[i+1]) < 1 & df$weight[i] < 3) {
#the second part of the condition limit a 3 the maximum number of segement merge (can be change to x segment)
df$end[i] <- df$end[i+1]
df$value[i] <- weighted.mean(df$value[c(i, i+1)],df$weight[c(i, i+1)] )
df$weight[i] <- df$weight[i]+1
df[i+1,] <- df[i,]
df[i,]$weight <- 0
}
}
df <- df[df$weight > 0,]