我在python中很新,我需要你的帮助。
我有一个这样的文件:
>chr14_Gap_2
ACCGCGATGAAAGAGTCGGTGGTGGGCTCGTTCCGACGCGCATCCCCTGGAAGTCCTGCTCAATCAGGTGCCGGATGAAGGTGGT
GCTCCTCCAGGGGGCAGCAGCTTCTGCGCGTACAGCTGCCACAGCCCCTAGGACACCGTCTGGAAGAGCTCCGGCTCCTTCTTG
acacccaggactgatctcctttaggatggactggctggatcttcttgcagtccaaggggctctcaagagt
………..
>chr14_Gap_3
ACCGCGATGAAAGAGTCGGTGGTGGGCTCGTTCCGACGCGCATCCCCTGGAAGTCCTGCTCAATCAGGTGCCGGATGAAGGTGGT
GCTCCTCCAGGGGGCAGCAGCTTCTGCGCGTACAGCTGCCACAGCCCCTAGGACACCGTCTGGAAGAGCTCCGGCTCCTTCTTG
acacccaggactgatctcctttaggatggactggctggatcttcttgcagtccaaggggctctcaagagt
………..
一个字符串作为标记,一个字符串表示dna序列。 我想计算N个字母的数量和小写字母的数量并取百分比。 我编写了以下脚本,但是我在打印时遇到了问题。
#!/usr/bin/python
import sys
if len (sys.argv) != 2 :
print "Usage: If you want to run this python script you have to put the fasta file that includes the desert area's sequences as arument"
sys.exit (1)
fasta_file = sys.argv[1]
#This script reads the sequences of the desert areas (fasta files) and calculates the persentage of the Ns and the repeats.
fasta_file = sys.argv[1]
f = open(fasta_file, 'r')
content = f.readlines()
x = len(content)
#print x
for i in range(0,len(content)):
if (i%2 == 0):
content[i].strip()
name = content[i].split(">")[1]
print name, #the "," makes the print command to avoid to print a new line
else:
content[i].strip()
numberOfN = content[i].count('N')
#print numberOfN
allChar = len(content[i])
lowerChars = sum(1 for c in content[i] if c.islower())
Ns_persentage = 100 * (numberOfN/float(allChar))
lower_persentage = 100 * (lowerChars/float(allChar))
waste = Ns_persentage + lower_persentage
print ("The waste persentage is: %s" % (round(waste)))
#print ("The persentage of Ns is: %s and the persentage of repeats is: %s" % (Ns_persentage,lower_persentage))
#print (name + waste)
问题是,它可以在第一行打印标签,在第二行打印废物变量,如下所示:
chr10_Gap_18759
The waste persentage is: 52.0
如何将它打印在同一行,标签分开?
例如
chr10_Gap_18759 52.0
chr10_Gap_19000 78.0
…….
非常感谢。
答案 0 :(得分:1)
您可以使用以下方式打印:
print name, "\t", round(waste)
如果您使用的是python 2.X 我会对你的代码做一些修改。 python的argparse模块用于管理命令行中的参数。我会做这样的事情:
#!/usr/bin/python
import argparse
# To use the arguments
parser = argparse.ArgumentParser()
parser.add_argument("fasta_file", help = "The fasta file to be processed ", type=str)
args = parser.parse_args()
f= open(args.fasta_file, "r")
content = f.readlines()
f.close()
x = len(content)
for i in range(x):
line = content[i].strip()
if (i%2 == 0):
#The first time it will fail, for the next occasions it will be printed as you wish
try:
print bname, "\t", round(waste)
except:
pass
name = line.split(">")[1]
else:
numberOfN = line.count('N')
allChar = len(line)
lowerChars = sum(1 for c in content[i] if c.islower())
Ns_persentage = 100 * (numberOfN/float(allChar))
lower_persentage = 100 * (lowerChars/float(allChar))
waste = Ns_persentage + lower_persentage
# To print the last case you need to do it outside the loop
print name, "\t", round(waste)
您还可以像print("{}\t{}".format(name, round(waste)))
我不确定使用i%2
,请注意,如果序列使用奇数行,则在相同事件发生之前,您将无法获得下一个序列的名称。我会检查该行是否以“>”开头然后使用存储名称,并对下一行的字符求和。
答案 1 :(得分:0)
不要在name
时打印(i%2 == 0)
,只需将其保存在变量中,然后在下一次迭代中与百分比一起打印:
print("{0}\t{1}".format(name, round(waste)))
这种字符串格式化方法(new in version 2.6)是Python 3中的新标准,应该优先于新代码中String Formatting Operations中描述的%格式。
答案 2 :(得分:0)
我修复了缩进和冗余:
#!/usr/bin/python
"""
This script reads the sequences of the desert areas (fasta files) and calculates the percentage of the Ns and the repeats.
2014-10-05 v1.0 by Vasilis
2014-10-05 v1.1 by Llopis
2015-02-27 v1.2 by Cees Timmerman
"""
import argparse
parser = argparse.ArgumentParser()
parser.add_argument("fasta_file", help="The fasta file to be processed.", type=str)
args = parser.parse_args()
with open(args.fasta_file, "r") as f:
for line in f.readlines():
line = line.strip()
if line[0] == '>':
name = line.split(">")[1]
print name,
else:
numberOfN = line.count('N')
allChar = len(line)
lowerChars = sum(1 for c in line if c.islower())
Ns_percentage = 100 * (numberOfN/float(allChar))
lower_percentage = 100 * (lowerChars/float(allChar))
waste = Ns_percentage + lower_percentage
print "\t", round(waste) # Note: https://docs.python.org/2/library/functions.html#round
美联储:
>chr14_Gap_2
ACCGCGATGAAAGAGTCGGTGGTGGGCTCGTTCCGACGCGCATCCCCTGGAAGTCCTGCTCAATCAGGTGCCGGATGAAGGTGGTGCTCCTCCAGGGGGCAGCAGCTTCTGCGCGTACAGCTGCCACAGCCCCTAGGACACCGTCTGGAAGAGCTCCGGCTCCTTCTTGacacccaggactgatctcctttaggatggactggctggatcttcttgcagtccaaggggctctcaagagt
>chr14_Gap_3
ACCGCGATGAAAGAGTCGGTGGTGGGCTCGTTCCGACGCGCATCCCCTGGAAGTCCTGCTCAATCAGGTGCCGGATGAAGGTGGTGCTCCTCCAGGGGGCAGCAGCTTCTGCGCGTACAGCTGCCACAGCCCCTAGGACACCGTCTGGAAGAGCTCCGGCTCCTTCTTGacacccaggactgatctcctttaggatggactggctggatcttcttgcagtccaaggggctctcaagagt
给出:
C:\Python27\python.exe -u "dna.py" fasta.txt
Process started >>>
chr14_Gap_2 29.0
chr14_Gap_3 29.0
<<< Process finished. (Exit code 0)
使用我最喜欢的Python IDE:Notepad++和NppExec plugin。