将包含范围的行拆分为任意数字的多行

时间:2014-01-13 09:34:50

标签: r plyr

给定一个data.frame,其中start和end表示范围。

id   start   end
 1       3    51
 2      20    28

如果范围包含另一个数字或数字序列并将它们分组,我正在尝试将行拆分为多行,例如25

id   start   end  splitGroup
 1       3    25           0
 1      25    51          25
 2      20    25           0
 2      25    28          25

与使用plyr包

分割常规序列类似
df <- data.frame(
  id    = c(1:2),
  start = c(3,20),
  end   = c(51,28)
)

splitBy <- 20

rowSplit <- function(df, splitBy){

  newDf <- ddply(df, .(id), function(x){
    data.frame(
      id = x$id,
      start = x$start,
      end = x$end,
      splitGroup = seq(
        floor(x$start/splitBy)*splitBy, 
        floor(x$end/splitBy)*splitBy, 
        by=splitBy
      )
    )
  })

  newDf <- within(newDf, {
    start <- ifelse(
      floor(start/splitBy)*splitBy == splitGroup,
      start, 
      splitGroup 
    )
    end <- ifelse( 
      end < (splitGroup + splitBy), 
      end,  
      (splitGroup + splitBy)
    )
  })  

  return(newDf)
}

rowSplit(df, splitBy)

id  start   end   splitGroup
 1      3    20            0
 1     20    40           20
 1     40    51           40
 2     20    28           20

如何使用任何单个数字或不规则数字来完成此操作

2 个答案:

答案 0 :(得分:1)

这是使用mod函数的开始:

 smod <- df$start%/%25   # 0 0
 emod<-df$end%/%25      # 2 1
 newstart<-numeric(0)
 matchit<-25*(1:100) # or at least extend to maximum value in your dataframe
 for (j in 1:2) { newstart<-c(newstart,df$start[j])
    if(emod[j]>0) newstart<-c(newstart, min(matchit[matchit>df$start[j]])) }

Rgames> newstart
[1]  3 25 20 25

以类似方式计算newend,您应该设置。

答案 1 :(得分:0)

根据@ carl-whitthoft的建议使用for循环,可以在一个断点处拆分行。但是这个过程需要很长时间,所以如果速度无关紧要,那就行了。

rowSplit <- function(df, splitAt, id ="id", start = "start", end = "end"){

  splitRow <- ifelse( df[ ,start] < splitAt & df[ ,end] > splitAt, TRUE, FALSE)

  newDf <- data.frame(
    id    = integer(), 
    start = numeric(),
    end   = numeric(),
    group = integer()
  )

  for (j in 1:nrow(df)){
    newDf <- rbind(
      newDf, 
      c(df[j,id], 
        df[j,start],
        ifelse(splitRow[j] == TRUE, splitAt, df[j,end]),
        ifelse(df[j,start] < splitAt, 0, splitAt)
      )
    )
    if (splitRow[j] == TRUE) {
      newDf <- rbind(newDf, c( df[j,id], splitAt, df[j,end], splitAt ))
    }
  }

  colnames(newDf) <- c("id", "start", "end", "group")

  return(newDf)
}

分裂为25:

df <- data.frame(
  id    = c(1:2),
  start = c(3,20),
  end   = c(51,28)
)

rowSplit(df, 25)

id start end group
 1     3  25     0
 1    25  51    25
 2    20  25     0
 2    25  28    25