Question

我有一个数据集，它是一个大字符向量（1,024,459个元素），由基因ID组成。它看起来像：

> length(allres)
[1] 1024459
>allres[1:10]  
[1] "1"   "1"   "1"   "1"   "1"   "1"   "1"   "10"  "10"  "100"

其中每个基因ID重复在RNA seq运行中看到的次数（所以这里有7个基因读数＆＃34; 1＆＃34; 2个基因＆＃34; 10＆＃34 ）。我想用10,000个读取间隔绘制每个读数的基因数量，这样我就能看到有多少基因被鉴定，如果我随机抽样10,000个读数，20,000,30,0000等。我制作了一个间距向量seq（）函数如下：

> gaps <- seq(10000, length(allres), by=10000)

但我不确定如何将其应用到我的allres矢量并绘制它。非常感谢任何帮助。

Answer 1

所以，你可能想要的是这样的：

gaps <- seq(10000, length(allres), by = 10000)

lapply(gaps, function(x){

    #This will give you the number of appearances of each value, within
    #an gaps[x]-sized sample of allres
    aggregated_sample <- table(sample(allres, size = x))

    #plotting code for sample goes here. And "x" is the number of reads so
    #you can even use it in the title!
    #Just remember to include code to save it to disc, if you want to save it to disc.
    return(TRUE)

})

如果您正在使用ggplot2进行绘图，当然，您甚至可以将绘图保存为对象，然后返回（绘图）而不是返回（TRUE），然后进行进一步的推文/调查。

下采样数据集

1 个答案: