如何为大型数据集创建GRange对象

时间:2015-05-18 12:55:36

标签: r bioconductor

我有一个包含colnames的大型数据集:

    "chromosome"      "start"         "end"         "h.gene"        "CPCN_LUNG"     "NCIH524_LUNG"  "SBC5_LUNG"     "NCIH446_LUNG"  "NCIH196_LUNG" 
  "NCIH209_LUNG"  "NCIH1963_LUNG" "NCIH211_LUNG"  "NCIH2196_LUNG" "NCIH526_LUNG"  "NCIH82_LUNG"   "SW1271_LUNG"   "DMS114_LUNG"   "NCIH2029_LUNG" "NCIH2066_LUNG" "NCIH1341_LUNG"
  "NCIH2227_LUNG" "NCIH69_LUNG"   "NCIH1048_LUNG" "DMS53_LUNG"    "SHP77_LUNG"    "NCIH1836_LUNG" "NCIH2141_LUNG" "COLO668_LUNG"  "NCIH1105_LUNG" "NCIH1876_LUNG" "NCIH841_LUNG" 
 "DMS273_LUNG"   "CORL279_LUNG"  "NCIH1092_LUNG" "CORL95_LUNG"   "CORL88_LUNG"   "NCIH1694_LUNG" "NCIH1436_LUNG"

我想在此数据集上创建GRange对象。

reference_GRange <- GRanges(seqnames= reference$chromosome,IRanges(start= reference$start,end= reference$end),h.gene=reference$h.gene) 

这将创建仅包含2个元数据列的Grange对象。有没有办法用参考表中的所有信息创建Grange对象。 [例如]从h.gene,CPCN_LUNG,NCIH524_LUNG,.........到NCIH1436_LUNG的元数据栏

2 个答案:

答案 0 :(得分:5)

makeGRangesFromDataFrame()keep.extra.columns=TRUE一起使用。或者,如上所述创建GRanges,然后添加mcols()删除不感兴趣的列。

mcols(gr) = reference[,-(1:3)]

随意在Bioconductor support forum上询问有关Bioconductor封装的问题。

答案 1 :(得分:0)

reference_GRange&lt; - GRanges(seqnames = reference $ chromosome,IRanges(start = reference $ start,end = reference $ end),h.gene = reference $ h.gene,CPCN_LUNG = reference $ CPCN_LUNG,NCIH524_LUNG = reference $ NCIH524_LUNG,..... NCIH1436_LUNG =参考$ NCIH1436_LUNG)。

但是要在GRnage对象中手动添加每个额外的列,这可能是hazzle job !!!