Question

我有dataframe名为full_data_string_split_removed2。当我这样做SP <- which(full_data_string_split_removed2$split1 == "SP#")时，我得到行号，它找到表达式SP#。执行print(full_data_string_split_removed2)会给出：data

在这种情况下，执行：Number_of_SP_lines <- length(SP)和print(Number_of_SP_lines)给出[1] 425这是正确的。首先，常量是我有一行，其中表达式SP#可以在列split1中找到，第二个常量是它后跟103行数据，如我所见示例数据。但是，不同数据集的SP#出现次数可能不同。所以我需要实现的是：

在列split7中找到SP#的行中的split1列中的条目，并将该值除以60并复制到新的表格单元格A2中A1的列名sample和repetition的名称将填充SP#列中split1的标准。
然后将以下103行的split2列split11中的条目转置到项目符号点1条目下的新数据框/表中，这些条目为1024条。
对剩余的SP#次发生执行步骤1和2，而每个SP#出现时应该有自己的列。

Answer 1

以下代码应该执行您想要的操作：

# Read in the data
tbl1 <- read.csv('SP21_only.csv')
# Find the rows where SP# is in split1
SP_indices <- which(grepl('SP#', tbl1$split1))
# Then store in tbl2, for each SP_indices row
tbl2 <- sapply(SP_indices, function(i){
    # That observation of sample + that observation of repetition
    c(paste(tbl1$sample[i], tbl1$repetition[i]),
      # That observation of split7 / 60
      tbl1$split7[i] / 60,
      # And concatenation into a vector the transposition of the next
      # 103 rows for the columns split2-split11
      c(t(tbl1[i + 1:103, paste0('split', 2:11)])))
})

请注意，结果矩阵的尺寸为1032行和425列，如上面的评论中所述。这适用于任意数量的SP#次出现，但只有在SP#次出现之间总共有103行时才有效。如果您需要它来处理任意数量的中间行，您可以执行以下操作：

# Read in the data
tbl1 <- read.csv('SP21_only.csv')
# It will be convenient to go ahead and paste together sample and repitition
sample_repetition <- paste(tbl1$sample, tbl1$repetition)
# Then we get a vector of length nrow(tbl1)
# that increments in value everytime split1 contains SP#
# This groups or separates the data into segments we need
groups <- cumsum(grepl('SP#', tbl1$split1))
# Then store in tbl2, for each group
tbl2 <- sapply(1:max(groups), function(x){
    group_indices <- which(groups == x)
    first_index <- min(group_indices)
    # The relevant element of sample_repetition,
    # The relevant element of split7 / 60, and
    return(c(sample_repetition[first_index], tbl1$split7[first_index] / 60,
             # the concatenation of the transposition of the relevant submatrix
             c(t(tbl1[group_indices[-1], paste0('split', 2:11)]))))
})

将行内容转置为一列，然后对下一行执行相同操作

1 个答案: