如何按R中的特定行划分列中的值?

时间:2016-04-16 12:27:45

标签: r

这是我的大数据的一个子集:

gene    feature reads
A       anot    2
A       3ss_A   3
A       3ss_B   5
B       5ss_A   1
B       anot    4
C       3ss_A   2
C       3ss_B   8
C       anot    3
C       5ss_A   6

我想将每个基因中对应于3ss和5ss特征的读数除以特征" anot"那个基因。我有每个基因的多个特征(这里没有显示),但每个基因只有一个" anot"特征

预期输出为:

gene    feature reads   ratio
A       anot    2       1
A       3ss_A   3       1.5
A       3ss_B   5       2.5
B       5ss_A   1       0.25
B       anot    4       1
C       3ss_A   2       0.666666667
C       3ss_B   8       2.666666667
C       anot    3       1
C       5ss_A   6       2

我怎么能在R中做到这一点? 感谢

3 个答案:

答案 0 :(得分:9)

以下是各种替代方案:

1)ave 像这样使用ave。函数fun传递一个基因的行号向量,并返回它的比率向量。没有包使用。

fun <- function(ix) with(DF[ix, ], reads / reads[feature == "anot"])
transform(DF, ratio = ave(1:nrow(DF), gene, FUN = fun))

,并提供:

  gene feature reads     ratio
1    A    anot     2 1.0000000
2    A   3ss_A     3 1.5000000
3    A   3ss_B     5 2.5000000
4    B   5ss_A     1 0.2500000
5    B    anot     4 1.0000000
6    C   3ss_A     2 0.6666667
7    C   3ss_B     8 2.6666667
8    C    anot     3 1.0000000
9    C   5ss_A     6 2.0000000

1a)ave 以下是使用ave的另一种方法。它用NA替换每个非anot读数,然后在每个基因中使用na.omit将读数除以非NA:

transform(DF, ratio = 
  reads / ave(ifelse(feature == "anot", reads, NA), gene, FUN = na.omit))

,并提供:

  gene feature reads     ratio
1    A    anot     2 1.0000000
2    A   3ss_A     3 1.5000000
3    A   3ss_B     5 2.5000000
4    B   5ss_A     1 0.2500000
5    B    anot     4 1.0000000
6    C   3ss_A     2 0.6666667
7    C   3ss_B     8 2.6666667
8    C    anot     3 1.0000000
9    C   5ss_A     6 2.0000000

1b)ave 这是另一个ave变体。这一点特别简洁,但假设reads的{​​{1}}值始终是非负的(在问题的示例中就是这种情况)。它会为anot创建一个等于reads的向量,否则为零,然后取最大值:

anot

,并提供:

transform(DF, ratio = reads / ave((feature == "anot") * reads, gene, FUN = max))

2)另一种方法是使用 gene feature reads ratio 1 A anot 2 1.0000000 2 A 3ss_A 3 1.5000000 3 A 3ss_B 5 2.5000000 4 B 5ss_A 1 0.2500000 5 B anot 4 1.0000000 6 C 3ss_A 2 0.6666667 7 C 3ss_B 8 2.6666667 8 C anot 3 1.0000000 9 C 5ss_A 6 2.0000000 ,也不使用任何软件包。这里函数by获取funby行的子集,并返回附加比率的子集。

DF

,并提供:

funby <- function(x) transform(x, ratio = reads / reads[feature == "anot"])
do.call("rbind", by(DF, DF$gene, funby))

3)rep / table 这也不使用包。它假设 gene feature reads ratio A.1 A anot 2 1.0000000 A.2 A 3ss_A 3 1.5000000 A.3 A 3ss_B 5 2.5000000 B.4 B 5ss_A 1 0.2500000 B.5 B anot 4 1.0000000 C.6 C 3ss_A 2 0.6666667 C.7 C 3ss_B 8 2.6666667 C.8 C anot 3 1.0000000 C.9 C 5ss_A 6 2.0000000 按基因排序(问题中的示例就是这种情况)。它会针对该基因中的行数重复每个DF读数,然后将anot除以该值。

reads

,并提供:

transform(DF, ratio = reads / rep(reads[feature == "anot"], table(gene)))

4)dplyr 使用dplyr包:

  gene feature reads     ratio
1    A    anot     2 1.0000000
2    A   3ss_A     3 1.5000000
3    A   3ss_B     5 2.5000000
4    B   5ss_A     1 0.2500000
5    B    anot     4 1.0000000
6    C   3ss_A     2 0.6666667
7    C   3ss_B     8 2.6666667
8    C    anot     3 1.0000000
9    C   5ss_A     6 2.0000000

,并提供:

library(dplyr)

DF %>% 
   group_by(gene) %>% 
   mutate(ratio = reads / reads[feature == "anot"]) %>% 
   ungroup()

5)data.table 使用data.table包:

Source: local data frame [9 x 4]

    gene feature reads     ratio
  (fctr)  (fctr) (int)     (dbl)
1      A    anot     2 1.0000000
2      A   3ss_A     3 1.5000000
3      A   3ss_B     5 2.5000000
4      B   5ss_A     1 0.2500000
5      B    anot     4 1.0000000
6      C   3ss_A     2 0.6666667
7      C   3ss_B     8 2.6666667
8      C    anot     3 1.0000000
9      C   5ss_A     6 2.0000000

,并提供:

library(data.table)

DT <- as.data.table(DF)
DT[, ratio := reads / reads[feature == "anot"], by = "gene"]

注意:可重复形式的输入> DT gene feature reads ratio 1: A anot 2 1.0000000 2: A 3ss_A 3 1.5000000 3: A 3ss_B 5 2.5000000 4: B 5ss_A 1 0.2500000 5: B anot 4 1.0000000 6: C 3ss_A 2 0.6666667 7: C 3ss_B 8 2.6666667 8: C anot 3 1.0000000 9: C 5ss_A 6 2.0000000 为:

DF

答案 1 :(得分:0)

您可以尝试类似

的内容
anot_reads        <- yourdata[yourdata$feature == "anot",]$reads
names(anot_reads) <- yourdata[yourdata$feature == "anot",]$gene
yourdata$ratio    <- yourdata$reads / anot_reads[yourdata$gene]

答案 2 :(得分:0)

您可以在R:

中使用
df$ratio <- unlist(sapply(levels(df$gene),
    function(l) with(subset(df, gene==l), reads / reads[feature=="anot"])))

gene feature reads     ratio
1    A    anot     2 1.0000000
2    A   3ss_A     3 1.5000000
3    A   3ss_B     5 2.5000000
4    B   5ss_A     1 0.2500000
5    B    anot     4 1.0000000
6    C   3ss_A     2 0.6666667
7    C   3ss_B     8 2.6666667
8    C    anot     3 1.0000000
9    C   5ss_A     6 2.0000000

它翻译为:应用gene:子集df的级别,将reads除以reads的{​​{1}}值。然后,feature==anot结果并在unlist

中创建一个新列

但可能有一个较短的选择。