Question

我有一个存储基因登录名和序列的文本文件，内容如下： original look of sequence 当我使用Python打印时，它显示每行以换行符结束：

['>hg19_knownGene_uc010nxr.1\n','cttgccgtcagccttttctttgacctcttctttctgttcatgtgtatttg\n','ctgtctcttagcccagacttcccgtgtcctttccaccgggcctttgagag\n','gtcacagggtcttgatgctgtggtcttcatctgcaggtgtctgacttcca\n','gcaactgctggcctgtgccagggtgcaagctgagcactggagtggagttt\n','>hg19_knownGene_uc001aai.1\n','aaggagatggtgctcttcttttttctttctgaattgtggccaccttcata\n','ccagtctgtcatggaacacttaagccgcttgagtgcctgctggtactccc\n','agccctgccatgcctgagccccctgcacacaaggagccaggagtaatcag\n','ggcagaccctttagggcacggggacttctggattgtgaaattggctctct\n','gggggccaaggccttctaacgttggtggaagtggctttggcttattgggt\n']

我不知道如何将序列的行连接成一个需要一行的单个字符串。我需要维护基因登录名（标头）和前101个核苷酸，并将所有这些信息存储在新的文本文件中。所以我想要的输出应该像这样：

>hg19_knownGene_uc010nxr.1
cttgccgtcagccttttctttgacctcttctttctgttcatgtgtatttgctgtctcttagcccagacttcccgtgtcctttccaccgggcctttgagagg
>hg19_knownGene_uc001aai.1
aaggagatggtgctcttcttttttctttctgaattgtggccaccttcataccagtctgtcatggaacacttaagccgcttgagtgcctgctggtactccca

我是Python的新手。希望任何人都能帮助我。非常感谢！

Answer 1

这非常简单。您可以使用以下内容：

output = lines[0]
output += "".join(line.strip() for line in lines[1:])
print(output)

编辑：没有看到每个条目有多个标题。为此，您可以使用

genes = []
for line in lines:
    if line.startswith(">"):
        genes.append(line)
    else:
        genes[-1] += line.strip()
for gene in genes:
    print(gene)

Answer 2

应该这样做

sequences = []
i = -1
for seq in list_of_seq:
    if seq[0]  == ">":
        sequences.append(seq[:-1] + "\n")
        i += 1            
    else:
        sequences[i] += seq[:-1]

for i in range(len(sequences)):
    with open(str(i)+".txt", "w") as wrfile:
        wrfile.write("".join(sequences[i]))

Answer 3

另一种方法：

“ l”是您的原始列表

存储在有序字典中：

from collections import OrderedDict
ord_dict = OrderedDict()
for itm in l:
    if itm.strip().startswith(">"):
        key=itm.strip()     #if it starts with a '>' then it is a key
        value=''
    else:
        value = itm.strip()  #otherwise a value
        if key in ord_dict:
            ord_dict[key] += value   #append value to previous list
        else:
            ord_dict[key] = value    #new key

要打印前101个字符：

对于ord_dict中的密钥：打印（键）打印（ord_dict [key] [：101]）

此打印：

>hg19_knownGene_uc010nxr.1
cttgccgtcagccttttctttgacctcttctttctgttcatgtgtatttgctgtctcttagcccagacttcccgtgtcctttccaccgggcctttgagagg
>hg19_knownGene_uc001aai.1
aaggagatggtgctcttcttttttctttctgaattgtggccaccttcataccagtctgtcatggaacacttaagccgcttgagtgcctgctggtactccca

如何使用Python在列表中有选择地连接以换行符结尾的行？

3 个答案: