带奖金的版本：

Question

我有一个数据框，如下所示。我想根据“断裂点”列重新排列数据框。

预期结果应如下

Answer 1

使用此示例数据：

df <- data.frame(
    Range1 = c(1, 2, 3, 5, 10, 12, 16, 20, 21, 28, 33),
    Range2 = c(2, 3, 5, 10, 12, 16, 20, 21, 28, 33, 40),
    Breakpoint = c("", "", "", "Y", "", "Y", "", "", "Y", "", ""))

尾随切为的解决方案是：

首先切断悬空的位：

 df2 = df[1:max(which(df$Breakpoint=="Y")),]

然后算出每组的长度：

> rgroup=rle(rev(cumsum(rev(df2$Break=="Y"))))$lengths

获取Y的位置：

> Ypos = which(df2$Breakpoint=="Y")

构造一个索引向量，该向量是Y位置减去从1到块的长度的反向序列。子集：

> df2[rep(Ypos, rgroup) - unlist(lapply(rgroup,function(x){1:x})) +1,]
  Range1 Range2 Breakpoint
4      5     10          Y
3      3      5           
2      2      3           
1      1      2           
6     12     16          Y
5     10     12           
9     21     28          Y
8     20     21           
7     16     20

根据需要重新添加悬挂位。

[编辑-上面添加了新版本。以下出于历史目的的代码]

我的旧版本是这个版本，并且处理了一些悬空的比特：

> group=rev(cumsum(rev(df$Break=="Y")))
> rbind(do.call(rbind,lapply(split(df[group>0,],-group[group>0]),function(x){x[nrow(x):1,,drop=FALSE]}))[,c("Range1","Range2")],df[max(which(df$Break=="Y")),1:2,drop=FALSE],df[group==0,1:2])

并获得：

     Range1 Range2
-3.4      5     10
-3.3      3      5
-3.2      2      3
-3.1      1      2
-2.6     12     16
-2.5     10     12
-1.9     21     28
-1.8     20     21
-1.7     16     20
9        21     28
10       28     33
11       33     40

如果您不喜欢行名，请删除它们。仅使用基本R函数。

我不确定在上一次休息之后是否没有任何遗留问题，是否可以解决这个问题，但是如果发生这种情况，您还没有很好地说明问题。

带奖金的版本：

> group=rev(cumsum(rev(df$Break=="Y")))

这将创建一个向量，该向量从最后一行的0开始，并在每次找到Y时增加。将其取反以获取直到每个Y的块的分组变量。

由于我将要发表评论，因此如果进行剪贴，这点将不起作用：

> rbind(

# we need to bind three things. The reversed chunks, the last break point and   
# the trailing stuff:

      do.call(

# the trailing stuff is the rbind of the reversed chunks:

          rbind,

#           split the data into a list of chunks 

             lapply(
               split(df[group>0,],-group[group>0]),

     # reverse them

                  function(x){x[nrow(x):1,,drop=FALSE]}
     # and only take the columns we need:
        ))[,c("Range1","Range2")],
  # this is the last Y
      df[max(which(df$Break=="Y")),1:2,drop=FALSE],

  # this is the trailing rows, get them in order they appear:

      df[group==0,1:2])

像这样进行注释使我看到可以进行的一些优化，但仅此而已。

Answer 2

取决于data.frame的大小，这可以通过for循环手动完成。

BreakPoints <- which(!is.na(DF$`break point`))
if(length(breakPoints) > 0){
    startIndex <- 1 #Startindex tells me where i should point the breakPoint
    for(i in breakPoints){ #Iterate over breakpoints
        #Put the break point at the startIndex row 
        DF[startIndex:i,] <- DF[c(i, startIndex:(i-1), ] 
        #Update the placement as the next block 
        startIndex <- i + 1
    }
}

如果您的数据很大，则可能有一种更有效的方法。通常，与其他方法相比，通过[<-.dataframe进行子设置的速度较慢。最初的优化程序可能只是将上面的代码转换为data.table格式，而子集要快得多。

根据索引列重新排列数据帧

2 个答案:

带奖金的版本：