使用python查找单词的评级

时间:2013-03-19 19:46:29

标签: python regex python-2.7

这是我的程序,如果我提供完整的名称,如果我输入eng,它会显示值,而不是只显示eng的值

import re
sent = "eng"
#sent=raw_input("Enter word")
#regex = re.compile('(^|\W)sent(?=(\W|$))')
for line in open("sir_try.txt").readlines():
    if sent == line.split()[0].strip():
        k = line.rsplit(',',1)[0].strip()
        print k
gene name        utr length
ensbta                  24
ensg1                   12
ensg24                  30
ensg37                  65
enscat                  22
ensm                    30

实际上我想做的是我想通过单词搜索highest value from the text file,并删除同一单词的文本文件中的所有值,其值小于上面的文本它应该删除12 , 30 for ensg,而不是it should find the minimum value from the utr values and display it with name 你回答我的是,我已经完成了,我在展示我的程序之前提到了它

6 个答案:

答案 0 :(得分:0)

尝试代替if sent ==并将其替换为if sent in (line.split()[0].strip()):

在这种情况下,应检查发送(engs)的值是否在参数中的任何位置(line.split()[0] .strip())。

如果您仍然只是尝试获取最高值,我只会创建一个变量值,然后是

if line.split()[1].strip() > value:
    value = line.split()[1].strip()

测试一下,让我们知道它是如何运作的。

答案 1 :(得分:0)

请试试这个

file=open("sir_try.txt","r")
list_line=file.readlines()
file.close()
all_text=""

dic={}
sent="ensg"
temp_list=[]
for line in list_line:
    all_text=all_text+line
    name= line.rsplit()[0].strip()
    score=line.rsplit()[1].strip()
    dic[name]=score
for i in dic.keys():
    if sent in i:
        temp_list.append(dic[i])
hiegh_score=max(temp_list)

def check(index):
    reverse_text=all_text[index+1::-1]
    index2=reverse_text.find("\n")
    if sent==reverse_text[:index2+1][::-1][1:len(sent)+1]:
        return False
    else:
        return True

list_to_min=dic.values()
for i in temp_list:
    if i!=hiegh_score:
        index=all_text.find(str(i))
        while check(index):
            index=all_text.find(str(i),index+len(str(i)))
        all_text=all_text[0:index]+all_text[index+len(str(i)):]
        list_to_min.remove(str(i))
#write all text to "sir_try.txt"
file2=open("sir_try.txt","w")
file2.write(all_text)
file2.close()
min_score= min(list_to_min)
for j in dic.keys():
    if min_score==dic[j]:
        print "min score is :"+str(min_score)+" for person "+j

函数检查是针对文件

时解释的错误
gene name        utr length
ali                     12
ali87                   30
ensbta                  24
ensg1                   12
ensg24                  30
ensg37                  65
enscat                  22
ensm                    30

程序删除ali分数,但我们没有它 通过添加检查功能我解决它 这个版本是最终版本答案

答案 2 :(得分:0)

import operator
f = open('./sir_try.txt', 'r')
f = f.readlines()
del f[0]

gene = {}
matched_gene = {}

for line in f:
    words = line.strip().split(' ')
    words = [word for word in words if not word == '']
    gene[words[0]] = words[1]

# getting user input
user_input = raw_input('Enter gene name: ')
for gene_name, utr_length in gene.iteritems():
    if user_input in gene_name:
        matched_gene[gene_name] = utr_length
m = max(matched_gene.iteritems(), key=operator.itemgetter(1))[0]
print m, matched_gene[m]  # expected answer

# code to remove redundant gene names as per requirement

for key in matched_gene.keys():
    if not key == m:
        matched_gene.pop(key)
for key in gene.keys():
    if user_input in key:
        gene.pop(key)

final_gene = dict(gene.items() + matched_gene.items())
out = open('./output.txt', 'w')
out.write('gene name' + '\t\t' + 'utr length' + '\n\n')
for key, value in final_gene.iteritems():
    out.write(key + '\t\t\t\t' + value + '\n')
out.close()

<强>输出:

Enter gene name: ensg
ensg37 65

答案 3 :(得分:0)

要查找具有关联的最大值(第二列)的名称(第一列),您需要首先split名称和值之间的空白处的行。然后,您可以使用内置的max()函数找到最大值。让它将值列作为排序标准。然后,您可以轻松找到相应的名称。

示例:

file_content = """
gene name        utr length
ensbta                  24
ensg1                   12
ensg24                  30
ensg37                  65
enscat                  22
ensm                    30
"""

# split lines at whitespace
l = [line.split() for line in file_content.splitlines()]

# skip headline and empty lines
l = [line for line in l if len(line) == 2]

print l

# find the maximum of second column
max_utr_length_tuple = max(l, key=lambda x:x[1])

print max_utr_length_tuple

print max_utr_length_tuple[0]

输出是:

$ python test.py
[['ensbta', '24'], ['ensg1', '12'], ['ensg24', '30'], ['ensg37', '65'], ['enscat', '22'], ['ensm', '30']]
['ensg37', '65'] 
ensg37

答案 4 :(得分:0)

短而甜蜜:

In [01]: t=file_content.split()[4:]
In [02]: b=((zip(t[0::2], t[1::2])))
In [03]: max(b, key=lambda x:x[1])
Out[03]: ('ensg37', '65')

答案 5 :(得分:0)

由于您已标记了问题,因此 这是你想要看到的东西,它是唯一一个(目前)使用正则表达式的东西!

import re

sent = 'ensg' # your sequence
# regex that will "filter" the lines containing value of sent  
my_re = re.compile(r'(.*?%s.*?)\s+?(\d+)' % sent)

with open('stack.txt') as f:
    lines = f.read() # get data from file

filtered = my_re.findall(lines) # "filter" your data
print filtered

# get the desired (tuple with maximum "utr length")
max_tuple = max(filtered, key=lambda x: x[1]) 
print max_tuple

输出:

[('ensg1', '12'), ('ensg24', '30'), ('ensg37', '65')]
('ensg37', '65')