按值

时间:2017-01-09 11:26:16

标签: r

我有data.frame()分段和值。 如果值足够接近,我想合并段。

可重复的例子 -

set.seed(4)
df <- data.frame(start = seq(from = 1, to = 91, by = 10),
                 end = seq(from = 10, to = 100, by = 10),
                 value = rnorm(10))

print(df)
   start end      value
1      1  10  0.2167549
2     11  20 -0.5424926
3     21  30  0.8911446
4     31  40  0.5959806
5     41  50  1.6356180
6     51  60  0.6892754
7     61  70 -1.2812466
8     71  80 -0.2131445
9     81  90  1.8965399
10    91 100  1.7768632

后续细分之间的区别是

for(i in 1:9) print(abs(df$value[i] - df$value[i+1]))
[1] 0.7592474
[1] 1.433637
[1] 0.2951641
[1] 1.039637
[1] 0.9463426
[1] 1.970522
[1] 1.068102
[1] 2.109684
[1] 0.1196767

假设我想合并diff小于1的段,值应该是段值的平均值。 结果应该是这样的 -

  start end      value
1     1  20 -0.1628689
2    21  40  0.7435626
3    41  60  1.1624467
4    61  70 -1.2812466
5    71  80 -0.2131445
6    81  10  1.8367015

如果一个接一个地有3个片段,我想将其中的三个合并为一个。

有没有简单的方法呢?

1 个答案:

答案 0 :(得分:1)

这是一个解决方案.. 变量weight表示合并段的数量。

set.seed(4)
df <- data.frame(start = seq(from = 1, to = 91, by = 10),
                 end = seq(from = 10, to = 100, by = 10),
                 value = rnorm(10))


df$weight <- 1 #initialize the number of merge segment

for (i in 1:(nrow(df)-1)){

  if (abs(df$value[i] - df$value[i+1]) < 1 & df$weight[i] < 3) {
  #the second part of the condition limit a 3 the maximum number of segement merge (can be change to x segment) 

    df$end[i] <- df$end[i+1]
    df$value[i] <- weighted.mean(df$value[c(i, i+1)],df$weight[c(i, i+1)] ) 
    df$weight[i] <- df$weight[i]+1
    df[i+1,] <- df[i,]
    df[i,]$weight <- 0 

    }

}
df <- df[df$weight > 0,]