Question

我正在尝试使用下面的代码对SimRAD包中的RADseq运行进行计算机模拟消解。我认为问题在于加载数据 - 我不断收到以下错误：

strsplit错误（DNAseq，split = recognition_code，fixed = FALSE，perl = FALSE）：非字符参数

在我的Fasta文件中，我的fasta标题是： gi 32456060 emb BX526834 Cryptosporidium parvum chromosome 6 我删除了所有。，和以前在文件名中，因为我读到|特别是会导致错误。

不幸的是，我发现的所有SimRAD文档都没有解决这个问题以及所有strsplit错误我发现论坛主题与相当不同的样本类型相关，我无法确定要更改的内容或修改以防止错误。

以下是包含所需包的代码：

    ###-----Start Code
    source("http://bioconductor.org/biocLite.R")
    biocLite("Biostrings")
    biocLite("ShortRead")

    install.packages("~/Downloads/SimRAD_0.95.tgz", repos = NULL)

    library(Biostrings)
    library(ShortRead)
    library(seqinr)
    library(SimRAD)

    #Restriction Enzyme 1
    #MseI #
    MseIcs_5p <- "T"
    MseIcs_3p <- "TAA"
    #Restriction Enzyme 2
    #EcoRI#
    EcoRIcs_5p <- "G"
    EcoRIcs_3p <- "AATTC"

    ##these are two alternative means I've tried to read in a fasta file 
    ##I believe this is the the problem line - either a new line for
    ##importing or #some subsequent line is needed to prevent the error that    
    ##follows
    CryptoParChr6 <- read.fasta(file = "filepath.fasta")
    CryptoParChr6 <- readDNAStringSet("filepath.fasta", "fasta")

    ##the error comes in at this line
    CryptoParChr6.dig <- insilico.digest(CryptoParChr6, MseIcs_5p,  
            MseIcs_3p, EcoRIcs_5p, EcoRIcs_3p, verbose = TRUE)

    ###-----End Code

如果有人熟悉SimRAD并且有任何关于正确导入fasta文件的建议 - 我将不胜感激。

Answer 1

您的问题与SimRAD软件包本身几乎没有关系。正如在?insilico.digest中明确写出的那样，第一个参数必须是字符串（或其向量）。例如。 read.fasta会输出SeqFastadna个对象的列表。所以，你必须自己提取序列：

myFasta     <- read.fasta(file = "filepath.fasta", as.string = 1)
mySequences <- unlist(myFasta)
myDigest    <- insilico.digest(myFasta, MseIcs_5p, MseIcs_3p, EcoRIcs_5p, EcoRIcs_3p, verbose = TRUE)

SimRAD用户：fasta导入以防止insilico.digest strsplit错误

1 个答案: