Question

我将csv文件导入数据集。现在我想将从第i行开始的48行数据块复制到新数据帧中，然后跳过3个48行块，然后将第4个48行块附加到新数据帧的结尾，依此类推，直到数据帧结束。我在这个问题上花了很多时间没有成功。提前感谢任何可能的暗示。

Answer 1

非常简单的oneliner：

new.df <- old.df[ c( rep( F, i - 1 ), rep( T, 48 ), rep( F, 48 * 3 ), rep( T, 48 ) ), ]

但是，嘿，让我们更简单：

new.df <- old.df[ c( rep( F, i - 1 ), rep( c( T, F, F, F, T ), each=48 ) ), ]

甚至

new.df <- old.df[ i - 1 + which( rep( c( T, F, F, F, T ), each=48 ) ), ]

说明：

我们创建一个真/假值的向量;将选择与T对应的行。我们使用c（）来连接块。首先，我们跳过i - 1（F），然后我们取48（T），然后我们跳过3 * 48，我们再拿一个48。

Answer 2

df <- data.frame(x = 1:1000, y = rnorm(1000))
> dim(df)
[1] 1000    2
# see that it has 1000 rows.
# let's say I want to copy 48 rows from row 102
new_df <- df[102:(102+48), ]
# or I do it with a variable
i <- 102
j <- i + 48
new_df <- df[i:j, ]
# If you need an uneven range, just make a vector
# Either specify a range of rows or just row numbers
rows_i_want <- c(1:48, 52, 55, 100:120, 128)
new_new_df <- df[rows_i_want, ]

以下是为任何data.frame

执行此操作的常规功能示例

# This function takes a data.frame and a starting index and a block size
keep_rows <- function(df, i, block = 48) {
    # Grab the number of rows remaining in the df from i to end
    nr <- nrow(df[i:nrow(df), ])
    if(i>nr)
        stop("index is too high")

    start <- seq(i, nr, by = block)

    if(length(start)==1)
        stop("index is too high")

    end <- c(start[2:length(start)], nrow(df))
    df2 <- data.frame(start, end)
    ranges <- apply(df2, 1, function(x) { x[[1]]:x[[2]]})
    to_keep <- rep(c(T,F,F,F,T), floor(round(nr/block)))
    return(df[to_keep[1:length(ranges)],])
}

无法过滤R中的数据帧

2 个答案: