这是我的程序,如果我提供完整的名称,如果我输入eng
,它会显示值,而不是只显示eng
的值
import re
sent = "eng"
#sent=raw_input("Enter word")
#regex = re.compile('(^|\W)sent(?=(\W|$))')
for line in open("sir_try.txt").readlines():
if sent == line.split()[0].strip():
k = line.rsplit(',',1)[0].strip()
print k
gene name utr length
ensbta 24
ensg1 12
ensg24 30
ensg37 65
enscat 22
ensm 30
实际上我想做的是我想通过单词搜索highest value from the text file
,并删除同一单词的文本文件中的所有值,其值小于上面的文本它应该删除12 , 30
for ensg,而不是it should find the minimum value from the utr values and display it with name
你回答我的是,我已经完成了,我在展示我的程序之前提到了它
答案 0 :(得分:0)
尝试代替if sent ==
并将其替换为if sent in (line.split()[0].strip()):
在这种情况下,应检查发送(engs)的值是否在参数中的任何位置(line.split()[0] .strip())。
如果您仍然只是尝试获取最高值,我只会创建一个变量值,然后是
if line.split()[1].strip() > value:
value = line.split()[1].strip()
测试一下,让我们知道它是如何运作的。
答案 1 :(得分:0)
请试试这个
file=open("sir_try.txt","r")
list_line=file.readlines()
file.close()
all_text=""
dic={}
sent="ensg"
temp_list=[]
for line in list_line:
all_text=all_text+line
name= line.rsplit()[0].strip()
score=line.rsplit()[1].strip()
dic[name]=score
for i in dic.keys():
if sent in i:
temp_list.append(dic[i])
hiegh_score=max(temp_list)
def check(index):
reverse_text=all_text[index+1::-1]
index2=reverse_text.find("\n")
if sent==reverse_text[:index2+1][::-1][1:len(sent)+1]:
return False
else:
return True
list_to_min=dic.values()
for i in temp_list:
if i!=hiegh_score:
index=all_text.find(str(i))
while check(index):
index=all_text.find(str(i),index+len(str(i)))
all_text=all_text[0:index]+all_text[index+len(str(i)):]
list_to_min.remove(str(i))
#write all text to "sir_try.txt"
file2=open("sir_try.txt","w")
file2.write(all_text)
file2.close()
min_score= min(list_to_min)
for j in dic.keys():
if min_score==dic[j]:
print "min score is :"+str(min_score)+" for person "+j
函数检查是针对文件
时解释的错误gene name utr length
ali 12
ali87 30
ensbta 24
ensg1 12
ensg24 30
ensg37 65
enscat 22
ensm 30
程序删除ali分数,但我们没有它 通过添加检查功能我解决它 这个版本是最终版本答案
答案 2 :(得分:0)
import operator
f = open('./sir_try.txt', 'r')
f = f.readlines()
del f[0]
gene = {}
matched_gene = {}
for line in f:
words = line.strip().split(' ')
words = [word for word in words if not word == '']
gene[words[0]] = words[1]
# getting user input
user_input = raw_input('Enter gene name: ')
for gene_name, utr_length in gene.iteritems():
if user_input in gene_name:
matched_gene[gene_name] = utr_length
m = max(matched_gene.iteritems(), key=operator.itemgetter(1))[0]
print m, matched_gene[m] # expected answer
# code to remove redundant gene names as per requirement
for key in matched_gene.keys():
if not key == m:
matched_gene.pop(key)
for key in gene.keys():
if user_input in key:
gene.pop(key)
final_gene = dict(gene.items() + matched_gene.items())
out = open('./output.txt', 'w')
out.write('gene name' + '\t\t' + 'utr length' + '\n\n')
for key, value in final_gene.iteritems():
out.write(key + '\t\t\t\t' + value + '\n')
out.close()
<强>输出:强>
Enter gene name: ensg
ensg37 65
答案 3 :(得分:0)
要查找具有关联的最大值(第二列)的名称(第一列),您需要首先split名称和值之间的空白处的行。然后,您可以使用内置的max()
函数找到最大值。让它将值列作为排序标准。然后,您可以轻松找到相应的名称。
示例:
file_content = """
gene name utr length
ensbta 24
ensg1 12
ensg24 30
ensg37 65
enscat 22
ensm 30
"""
# split lines at whitespace
l = [line.split() for line in file_content.splitlines()]
# skip headline and empty lines
l = [line for line in l if len(line) == 2]
print l
# find the maximum of second column
max_utr_length_tuple = max(l, key=lambda x:x[1])
print max_utr_length_tuple
print max_utr_length_tuple[0]
输出是:
$ python test.py
[['ensbta', '24'], ['ensg1', '12'], ['ensg24', '30'], ['ensg37', '65'], ['enscat', '22'], ['ensm', '30']]
['ensg37', '65']
ensg37
答案 4 :(得分:0)
短而甜蜜:
In [01]: t=file_content.split()[4:]
In [02]: b=((zip(t[0::2], t[1::2])))
In [03]: max(b, key=lambda x:x[1])
Out[03]: ('ensg37', '65')
答案 5 :(得分:0)
由于您已标记了问题regex,因此 这是你想要看到的东西,它是唯一一个(目前)使用正则表达式的东西!
import re
sent = 'ensg' # your sequence
# regex that will "filter" the lines containing value of sent
my_re = re.compile(r'(.*?%s.*?)\s+?(\d+)' % sent)
with open('stack.txt') as f:
lines = f.read() # get data from file
filtered = my_re.findall(lines) # "filter" your data
print filtered
# get the desired (tuple with maximum "utr length")
max_tuple = max(filtered, key=lambda x: x[1])
print max_tuple
输出:
[('ensg1', '12'), ('ensg24', '30'), ('ensg37', '65')]
('ensg37', '65')