使用fread,data.table包读取strand(+, - )列

时间:2013-03-13 14:54:33

标签: r data.table

我正在尝试使用fread将基因组对齐读入R中的data.table。这是对齐文件的快照:

USI-EAS28:1:100:1786:674#0/1    +   1_maternal  68326824      CTCAATTATACTGAAAGAAACACAATATATCATA    IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII  0   
USI-EAS28:1:100:1786:940#0/1    +   16_maternal 11407541    CTATTAGTGACCTGCTGTGGGACCTTGGGATGGT  IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII  0   
USI-EAS28:1:100:1786:705#0/1    +   1_maternal  63849584    CTGAGGGTTTGTGTCAGGAAGGGGTGTGGAATTG  IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII  0   0:T>C
USI-EAS28:1:100:1786:1168#0/1   -   5_maternal  31381649    GCATCATTCATGAAACAATTTTCAAGAGAGGAAA  IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII  0   
 USI-EAS28:1:100:1787:582#0/1   +   10_maternal 54587781    CTACAATAATAATAGGGGACTAAAACACCCCACT  IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII  0   
 USI-EAS28:1:100:1787:62#0/1    +   10_maternal 70390747     CTATTTGCTACTGAATTGTTAATTTTAAAACAGT IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII  0   
 USI-EAS28:1:100:1788:573#0/1   -   7_maternal  92583837     CACTGTCAACATTAGACAGACCAATGAGACAAAG IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII  0   
 USI-EAS28:1:100:1788:854#0/1   +   7_maternal  129611206    GTTTGTTTTTTTTTTTGAGATGGAGTCTCATTTT IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII  0   32:C>T
 USI-EAS28:1:100:1788:185#0/1   -   13_maternal 23694307    CAAACAAACTCAAAATGGACTATCGACTGAAAAA  IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII  0   
 USI-EAS28:1:100:1788:1339#0/1  -   13_maternal 33699510    TTAACTCTAGTTTTTAGGGATTGCAAATTAGACG  IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII  0   0:A>G

第二列报告读取对齐的链(+为正向,-为反向)。不幸的是,fread正在尝试将此列读取为整数,将值始终指定为0.此列应该作为字符读取,或者甚至是布尔值。尝试使用参数sepsep2也无济于事。

1 个答案:

答案 0 :(得分:3)

感谢报道。现已在v1.8.9中修复了提交849. +-现在被读作字符,添加了测试。

顺便说一下,我们还打算添加colClasses,以便您可以覆盖fread检测到的列类型。与fread相关的未完成待办事项列表位于源文件的顶部:
https://r-forge.r-project.org/scm/viewvc.php/pkg/src/fread.c?view=markup&root=datatable