有条件地根据特定列中的值拆分行

时间:2015-08-04 15:22:41

标签: r

我想将下面的列转换为下面的格式。重新格式化的方式是样本在样本类型N之间分组。例如,下面的前两行组合在一起,7397-DNA_A01到7399-DNA_A01组合在一起。

  Sample     Sample Type    
7393.DNA_A01    N
7394-DNA_A01    T
7395-DNA_A01    N
7396-DNA_A01    T
7397-DNA_A01    N
7398-DNA_A01    T
7399-DNA_A01    LN
7400-DNA_A01    N
7401-DNA_A01    T
7402-DNA_A01    B


  desired output
      N               T           B              LN
 7393.DNA_A01  7394-DNA_A01
 7395-DNA_A01  7396-DNA_A01
 7397-DNA_A01  7398-DNA_A01                   7399-DNA_A01
 7400-DNA_A01  7401-DNA_A01    7402-DNA_A01

我真的不确定在遇到N时如何分割行,然后我想我需要以某种方式进行转置。请帮忙!

1 个答案:

答案 0 :(得分:1)

我们需要根据'N'的出现创建一个分组索引('indx')。在这里,创建了一个逻辑向量(SampleType=='N')和cumsum来创建'indx'。根据列的顺序,将“SampleType”列更改为factor并按预期结果中列名称的顺序指定级别可能很有用。然后,我们可以使用dcastreshape2中的data.table

library(data.table)#v1.9.5+
setDT(df1)[, indx:=cumsum(SampleType=='N')
    ][, SampleType:= factor(SampleType, levels=c('N', 'T', 'B', 'LN'))]

dcast(df1, indx~SampleType, value.var='Sample', fill='')[,-1,with=FALSE]
#          N            T            B           LN
#1: 7393.DNA_A01 7394-DNA_A01                          
#2: 7395-DNA_A01 7396-DNA_A01                          
#3: 7397-DNA_A01 7398-DNA_A01              7399-DNA_A01
#4: 7400-DNA_A01 7401-DNA_A01 7402-DNA_A01             

如果您使用dcast中的reshape2,则可以通过base R选项创建'indx'列。您还可以使用类似的代码将“SampleType”列更改为factor

 df1$indx <- cumsum(df1$SampleType=='N')
 library(reshape2)
 dcast(df1, indx~SampleType, value.var='Sample', fill='')

数据

df1 <- structure(list(Sample = c("7393.DNA_A01", "7394-DNA_A01",
"7395-DNA_A01", 
"7396-DNA_A01", "7397-DNA_A01", "7398-DNA_A01", "7399-DNA_A01", 
"7400-DNA_A01", "7401-DNA_A01", "7402-DNA_A01"), SampleType = c("N", 
"T", "N", "T", "N", "T", "LN", "N", "T", "B")), .Names = c("Sample", 
"SampleType"), class = "data.frame", row.names = c(NA, -10L))