Question

我有要解析的数据行。数据如下所示：

a score=216 expect=1.05e-06
a score=180 expect=0.0394

我想要做的是拥有一个子程序解析它们并返回2个值（得分和期望）每一行。

然而，我的这个功能似乎不起作用：

def scoreEvalFromMaf(mafLines):
    for word in mafLines[0]:
        if word.startswith("score="):
            theScore = word.split('=')[1]
            theEval  = word.split('=')[2]
            return [theScore, theEval]
    raise Exception("encountered an alignment without a score")

请告知正确的方法是什么？

Answer 1

如果mafLines是一个行列表，并且您只想查看第一行，.split该行获取单词。例如：

def scoreEvalFromMaf(mafLines):
    theScore = None
    theEval = None
    for word in mafLines[0].split:
        if word.startswith('score='):
            _, theScore = word.partition('=')
        elif word.startswith('expect='):
            _, theEval = word.partition('=')
    if theScore is None:
        raise Exception("encountered an alignment without a score")
    if theEVal is None:
        raise Exception("encountered an alignment without an eval")
    return theScore, theEval

请注意，这将返回一个包含两个 string 项的元组;例如，如果你想要一个int和一个浮点数，你需要将最后一行改为

    return int(theScore), float(theEval)

然后如果任一字符串对于它应该表示的类型无效，则会得到一个ValueError异常，如果两个字符串都有效，则返回带有两个数字的元组。

Answer 2

看起来你想用空格分割每一行，并分别解析每个块。如果mafLines是一个字符串（即.readlines()中的一行：

def scoreEvalFromMafLine(mafLine):
    theScore, theEval = None, None
    for word in mafLine.split():
        if word.startswith("score="):
            theScore = word.split('=')[1]
        if word.startswith("expect="):
            theEval  = word.split('=')[1]

    if theScore is None or theEval is None:
        raise Exception("Invalid line: '%s'" % line)

    return (theScore, theEval)

你这样做会迭代第一行中的每个字符（因为它是一个字符串列表）而不是每个空格。

Answer 3

强制性且可能不合适的正则表达式解决方案：

import re
def scoreEvalFromMaf(mafLines):
    return [re.search(r'score=(.+) expect=(.+)', line).groups()
            for line in mafLines]

在Python中用分隔符解析行

3 个答案: