给定一个data.frame,其中start和end表示范围。
id start end
1 3 51
2 20 28
如果范围包含另一个数字或数字序列并将它们分组,我正在尝试将行拆分为多行,例如25
id start end splitGroup
1 3 25 0
1 25 51 25
2 20 25 0
2 25 28 25
与使用plyr包
分割常规序列类似df <- data.frame(
id = c(1:2),
start = c(3,20),
end = c(51,28)
)
splitBy <- 20
rowSplit <- function(df, splitBy){
newDf <- ddply(df, .(id), function(x){
data.frame(
id = x$id,
start = x$start,
end = x$end,
splitGroup = seq(
floor(x$start/splitBy)*splitBy,
floor(x$end/splitBy)*splitBy,
by=splitBy
)
)
})
newDf <- within(newDf, {
start <- ifelse(
floor(start/splitBy)*splitBy == splitGroup,
start,
splitGroup
)
end <- ifelse(
end < (splitGroup + splitBy),
end,
(splitGroup + splitBy)
)
})
return(newDf)
}
rowSplit(df, splitBy)
id start end splitGroup
1 3 20 0
1 20 40 20
1 40 51 40
2 20 28 20
如何使用任何单个数字或不规则数字来完成此操作
答案 0 :(得分:1)
这是使用mod函数的开始:
smod <- df$start%/%25 # 0 0
emod<-df$end%/%25 # 2 1
newstart<-numeric(0)
matchit<-25*(1:100) # or at least extend to maximum value in your dataframe
for (j in 1:2) { newstart<-c(newstart,df$start[j])
if(emod[j]>0) newstart<-c(newstart, min(matchit[matchit>df$start[j]])) }
Rgames> newstart
[1] 3 25 20 25
以类似方式计算newend
,您应该设置。
答案 1 :(得分:0)
根据@ carl-whitthoft的建议使用for
循环,可以在一个断点处拆分行。但是这个过程需要很长时间,所以如果速度无关紧要,那就行了。
rowSplit <- function(df, splitAt, id ="id", start = "start", end = "end"){
splitRow <- ifelse( df[ ,start] < splitAt & df[ ,end] > splitAt, TRUE, FALSE)
newDf <- data.frame(
id = integer(),
start = numeric(),
end = numeric(),
group = integer()
)
for (j in 1:nrow(df)){
newDf <- rbind(
newDf,
c(df[j,id],
df[j,start],
ifelse(splitRow[j] == TRUE, splitAt, df[j,end]),
ifelse(df[j,start] < splitAt, 0, splitAt)
)
)
if (splitRow[j] == TRUE) {
newDf <- rbind(newDf, c( df[j,id], splitAt, df[j,end], splitAt ))
}
}
colnames(newDf) <- c("id", "start", "end", "group")
return(newDf)
}
分裂为25:
df <- data.frame(
id = c(1:2),
start = c(3,20),
end = c(51,28)
)
rowSplit(df, 25)
id start end group
1 3 25 0
1 25 51 25
2 20 25 0
2 25 28 25