将字符串数据集转换为矩阵

时间:2016-11-18 13:13:05

标签: r string split bioinformatics

我有一个按制表符分隔的数据集,所以我想将以下数据集转换为矩阵

CATGGGGAAAACTGA
CCTCTCGATCACCGA
CCTATAGATCACCGA
CCGATTGATCACCGA
CCTTGTGCAGACCGA

我以前用过

rbind(strsplit("CATGGGGAAAACTGA","")[[1]],
        strsplit("CCTCTCGATCACCGA","")[[1]],
        strsplit("CCTCTCGATCACCGA","")[[1]],
        strsplit("CCTATAGATCACCGA","")[[1]],
        strsplit("CCGATTGATCACCGA","")[[1]],
        strsplit("CCTTGTGCAGACCGA","")[[1]])

这会产生:

[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15]
[1,] "C"  "A"  "T"  "G"  "G"  "G"  "G"  "A"  "A"  "A"   "A"   "C"   "T"   "G"   "A"  
[2,] "C"  "C"  "T"  "C"  "T"  "C"  "G"  "A"  "T"  "C"   "A"   "C"   "C"   "G"   "A"  
[3,] "C"  "C"  "T"  "C"  "T"  "C"  "G"  "A"  "T"  "C"   "A"   "C"   "C"   "G"   "A"  
[4,] "C"  "C"  "T"  "A"  "T"  "A"  "G"  "A"  "T"  "C"   "A"   "C"   "C"   "G"   "A"  
[5,] "C"  "C"  "G"  "A"  "T"  "T"  "G"  "A"  "T"  "C"   "A"   "C"   "C"   "G"   "A"  
[6,] "C"  "C"  "T"  "T"  "G"  "T"  "G"  "C"  "A"  "G"   "A"   "C"   "C"   "G"   "A"

但是当数据集非常大时,这个过程就会让人筋疲力尽。我怎么能自动完成呢?

1 个答案:

答案 0 :(得分:5)

您可以使用read.fwf分割成单个字符:

read.fwf(textConnection("CATGGGGAAAACTGA
CCTCTCGATCACCGA
CCTATAGATCACCGA
CCGATTGATCACCGA
CCTTGTGCAGACCGA"), rep(1, nchar("CATGGGGAAAACTGA")))
#  V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15
#1  C  A  T  G  G  G  G  A  A   A   A   C   T   G   A
#2  C  C  T  C  T  C  G  A  T   C   A   C   C   G   A
#3  C  C  T  A  T  A  G  A  T   C   A   C   C   G   A
#4  C  C  G  A  T  T  G  A  T   C   A   C   C   G   A
#5  C  C  T  T  G  T  G  C  A   G   A   C   C   G   A

您可能希望传递文件名而不是文本连接。