我有一个数据集,它是一个大字符向量(1,024,459个元素),由基因ID组成。它看起来像:
> length(allres)
[1] 1024459
>allres[1:10]
[1] "1" "1" "1" "1" "1" "1" "1" "10" "10" "100"
其中每个基因ID重复在RNA seq运行中看到的次数(所以这里有7个基因读数" 1" 2个基因" 10&#34 )。我想用10,000个读取间隔绘制每个读数的基因数量,这样我就能看到有多少基因被鉴定,如果我随机抽样10,000个读数,20,000,30,0000等。我制作了一个间距向量seq()函数如下:
> gaps <- seq(10000, length(allres), by=10000)
但我不确定如何将其应用到我的allres矢量并绘制它。非常感谢任何帮助。
答案 0 :(得分:1)
所以,你可能想要的是这样的:
gaps <- seq(10000, length(allres), by = 10000)
lapply(gaps, function(x){
#This will give you the number of appearances of each value, within
#an gaps[x]-sized sample of allres
aggregated_sample <- table(sample(allres, size = x))
#plotting code for sample goes here. And "x" is the number of reads so
#you can even use it in the title!
#Just remember to include code to save it to disc, if you want to save it to disc.
return(TRUE)
})
如果您正在使用ggplot2进行绘图,当然,您甚至可以将绘图保存为对象,然后返回(绘图)而不是返回(TRUE),然后进行进一步的推文/调查。