我有一个数据框(data
):
sample chrom pos ref alt tri trans decomposed_tri grouped_trans type feature gene
1 1 1 659105 G A CGT G>A ACG C>T somatic intron ds
2 1 1 1227592 A G CAC A>G GTG T>C somatic intron CG42329
3 1 1 1775341 T G CTG T>G CTG T>G somatic intergenic intergenic
4 1 1 1775552 T C GTT T>C GTT T>C somatic intergenic intergenic
5 1 1 1812639 T G GTG T>G GTG T>G somatic intergenic intergenic
6 1 1 1812641 G A GGA G>A TCC C>T somatic intergenic intergenic
基因列表及其长度(gene_lengths
):
$`128up`
[1] 1553
$`14-3-3epsilon`
[1] 8019
$`14-3-3zeta`
[1] 10010
$`140up`
[1] 1385
$`18SrRNA-Psi:CR41602`
[1] 1974
$`18SrRNA-Psi:CR45861`
[1] 1933
我想:
a)根据基因长度(gene_lengths
)和基因组长度(137547960
计算您希望在此列表中看到基因的次数)
b)计算实际看到每个基因的次数hit_genes<-table(data$gene)
c)计算观察/预期fc<-gene_lengths[g]/gene_expect
的比率
d)将此作为数据框返回
这就是我正在做的事情:
snv_count<-nrow(data) # total number of observations
hit_genes<-table(data$gene) # the number of times I find each gene in my data
cat("gene", "observed", "expected", "fc", "\n")
for (g in levels(data$gene)) {
genefraction<-gene_lengths[[g]]/137547960
gene_expect<-snv_count*(genefraction)
fc<-gene_lengths[g]/gene_expect
cat(g, hit_genes[g], gene_expect, fc, "\n")
}
gene observed expected fc
128up 5 1.493344 3.348189
18SrRNA-Psi:CR45861 3 0.5076489 5.909596
C442219 4 0.03778505 105.862
这很有效。但是,我在一个函数中运行它,并希望返回一个数据框,如何在for循环中逐行构建数据框?我在循环之前尝试初始化空数据框:
df <- data.frame(gene = character(), observed = numeric(), expected = numeric(), fc = numeric())
然后在循环中逐行构建:
enriched <- rbind(df, data.frame(gene = g, observed = hit_genes[g], expected = gene_expect, fc = fc))
但是我收到以下错误:
Error in data.frame(gene = g, observed = hit_genes[g], expected = gene_expect, :
arguments imply differing number of rows: 1, 0
另一个问题是 - 我应该使用ddply
来实现这个而不是循环吗?
答案 0 :(得分:2)
也许是?lapply
。 (未测试)。
enriched <- lapply(levels(data$gene), fun)
enriched <- do.call(rbind, enriched)
enriched
# 'fun' returns a list with four members
fun <- function(g) {
genefraction<-gene_lengths[[g]]/137547960
gene_expect<-snv_count*(genefraction)
fc<-hit_genes[g]/gene_expect
list(gene = g, observed = hit_genes[g], expected = gene_expect, fc = fc)
}
请注意,这假设函数fun
中引用的对象可用,即在全局环境中。