将multifasta解析器从Python转换为C#

时间:2014-03-20 14:57:57

标签: c# python fasta

我正在尝试将多个fasta解析器从Python转换为C#。输入

>header1
ACTG
GCTA

>header2
GATTACA

它将返回字典{'header2': 'GATTACA', 'header1': 'ACTGGCTA'}

原始Python代码如下:

def fastaParser(handle):
    """  Adapted from https://github.com/biopython/biopython/blob/master/Bio/SeqIO/FastaIO.py#L39 """
    fastaDict = {}
    #Skip any text before the first record (e.g. blank lines, comments)
    while True:
        line = handle.readline()
        if line == "":
            return  # Premature end of file, or just empty?
        if line[0] == ">":
            break

    while True:
        if line[0] != ">":
            raise ValueError("Records in Fasta files should start with '>' character")
        title = line[1:].rstrip()
        lines = []
        line = handle.readline()
        while True:
            if not line:
                break
            if line[0] == ">":
                break
            lines.append(line.rstrip())
            line = handle.readline()

        #Remove trailing whitespace, and any internal spaces
        sequence = "".join(lines).replace(" ", "").replace("\r", "")
        fastaDict[title] = sequence

        if not line:
            return fastaDict

if __name__ == '__main__':
    with open('fasta.txt') as f:
        print fastaParser(f)

我的C#代码是什么(我的代码需要一个字符串而不是一个打开的文件句柄):

    public Dictionary<int, string> parseFasta(string multiFasta)
    {
        Dictionary<int, string> fastaDict = new Dictionary<int, string>();
        using (System.IO.StringReader multiFastaReader = new System.IO.StringReader(multiFasta))
        {
            // Skip any text before the first record (e.g. blank lines, comments)
            while (true)
            {
                string line = multiFastaReader.ReadLine();
                if (line == "")
                {
                    return fastaDict; // Premature end of file, or just empty?
                }
                if (line[0] == '>')
                {
                    break;
                }
            }

            while (true)
            {
                if (line[0] != '>') // <- Here I get the error: "the name 'line' does not exist in the current context
                {
                    throw new Exception("Records in Fasta files should start with '>' character");
                }

                string title= line[1:].TrimEnd();
                List<string> lines = new List<string>();

                line = multiFastaReader.ReadLine();

                while (true)
                {
                    if (!line)
                    {
                        break;
                    }
                    if (line[0] == '>')
                    {
                        break;
                    }
                    lines.Add(line.TrimEnd());
                    line = multiFastaReader.ReadLine();
                }

                // Remove trailing whitespace, and any internal spaces
                string sequence = String.Join("", lines).Replace(" ", "").Replace("\r", "");
                fastaDict.Add(title, sequence);

                if (!line)
                {
                    return fastaDict;
                }
            }
        }
     }

我得到的错误是Visual Studio在当前上下文中不存在第二个line之后调用while (true)的变量。

1 个答案:

答案 0 :(得分:-2)

我终于使用了这段代码:

    public Dictionary<string, string> parseFasta(string multiFasta)
    {
        Dictionary<string, string> fastaDict = new Dictionary<string, string>();
        using (System.IO.StringReader multiFastaReader = new System.IO.StringReader(multiFasta))
        {
            // Skip any text before the first record (e.g. blank lines, comments)
            string line = multiFastaReader.ReadLine();
            while (true)
            {
                if (line == "")
                {
                    return fastaDict; // Premature end of file, or just empty?
                }
                if (line[0] == '>')
                {
                    break;
                }
            }

            while (true)
            {
                if (line[0] != '>')
                {
                    throw new Exception("Records in Fasta files should start with '>' character");
                }

                string title= line.Substring(1, line.Length-1).TrimEnd();
                List<string> lines = new List<string>();

                line = multiFastaReader.ReadLine();

                while (true)
                {
                    if (line == "")
                    {
                        break;
                    }
                    if (line == null)
                    {
                        break;
                    }
                    if (line[0] == '>')
                    {
                        break;
                    }
                    lines.Add(line.TrimEnd());
                    line = multiFastaReader.ReadLine();
                }

                // Remove trailing whitespace, and any internal spaces
                string sequence = String.Join("", lines).Replace(" ", "").Replace("\r", "");
                fastaDict.Add(title, sequence);

                if (line == null)
                {
                    return fastaDict;
                }
            }
        }
     }