生成器函数仅生成第一个项目

时间:2013-04-09 19:26:12

标签: python generator

我有以下格式的数据:

data = """

[Data-0]
Data = BATCH
BatProtocol = DIAG-ST
BatCreate = 20010724

[Data-1]
Data = SAMP
SampNum = 357
SampLane = 1

[Data-2]
Data = SAMP
SampNum = 357
SampLane = 2

[Data-9]
Data = BATCH
BatProtocol = VCA
BatCreate = 20010725

[Data-10]
Data = SAMP
SampNum = 359
SampLane = 1

[Data-11]
Data = SAMP
SampNum = 359
SampLane = 2

"""

结构是:

  1. [Data-x]其中x是数字
  2. Data =后跟BATCHSAMPLE
  3. 更多行
  4. 我正在尝试编写一个函数,为每个'批处理'生成一个列表。列表的第一项是包含行Data = BATCH的文本块,列表中的以下项是包含行Data = SAMP的文本块。我目前有

    def get_batches(data):
        textblocks = iter([txt for txt in data.split('\n\n') if txt.strip()])
        batch = []
        sample = next(textblocks)
        while True:
            if 'BATCH' in sample:
                batch.append(sample)
            sample = next(textblocks)
            if 'BATCH' in sample:
                yield batch
                batch = []
            else:
                batch.append(sample)
    

    如果这样调用:

    batches = get_batches(data)
    for batch in batches:
        print batch
        print '_' * 20
    
    但是,它只返回第一个'批次':

    ['[Data-0]\nData = BATCH\nBatProtocol = DIAG-ST\nBatCreate = 20010724', 
     '[Data-1]\nData = SAMP\nSampNum = 357\nSampLane = 1', 
     '[Data-2]\nData = SAMP\nSampNum = 357\nSampLane = 2']
    ____________________
    

    Wheras我的预期输出是:

    ['[Data-0]\nData = BATCH\nBatProtocol = DIAG-ST\nBatCreate = 20010724', 
     '[Data-1]\nData = SAMP\nSampNum = 357\nSampLane = 1', 
     '[Data-2]\nData = SAMP\nSampNum = 357\nSampLane = 2']
    ____________________
    ['[Data-9]\nData = BATCH\nBatProtocol = VCA\nBatCreate = 20010725', 
     '[Data-10]\nData = SAMP\nSampNum = 359\nSampLane = 1', 
     '[Data-11]\nData = SAMP\nSampNum = 359\nSampLane = 2']
    ____________________
    

    我缺少什么或如何改善我的功能?

2 个答案:

答案 0 :(得分:6)

当您找到下一批的开头时,您只会产生批次,因此您将永远不会包含最后一批数据。要解决此问题,您需要在函数结束时使用以下内容:

if batch:
    yield batch

然而,这样做是行不通的。最终,循环内部的next(textblocks)将引发StopIteration,因此在执行while循环后无代码。这是一种只需对当前代码进行微小更改即可实现此功能的方法(请参阅下面的更好版本):

def get_batches(data):
    textblocks = iter([txt for txt in data.split('\n\n') if txt.strip()])
    batch = []
    sample = next(textblocks)
    while True:
        if 'BATCH' in sample:
            batch.append(sample)
        try:
            sample = next(textblocks)
        except StopIteration:
            break
        if 'BATCH' in sample:
            yield batch
            batch = []
        else:
            batch.append(sample)
    if batch:
        yield batch

我建议只使用textblocks循环遍历for

def get_batches(data):
    textblocks = (txt for txt in data.split('\n\n') if txt.strip())
    batch = []
    for sample in textblocks:
        if 'BATCH' in sample:
            if batch:
                yield batch
            batch = []
        batch.append(sample)
    if batch:
        yield batch

答案 1 :(得分:2)

正如@ F.J解释的那样,你的代码的真正问题在于你没有产生最后一个值。但是,还可以进行其他改进,其中一些改进可以更容易地解决最后一个问题。

在我第一次查看代码时突出显示的最重要的一个是检查if的两个'BATCH' in sample语句,它们可以组合成一个。

这是一个版本,它可以在生成器上使用for循环,而不是while True

def get_batches(data):
    textblocks = (txt for txt in data.split('\n\n') if txt.strip())
    batch = [next(textblocks)]
    for sample in textblocks:
        if 'BATCH' in sample:
            yield batch
            batch = []
        batch.append(sample)
    yield batch

我最后无条件地屈服batch,因为在batch为空的情况下你无法到达那里(如果data为空,则初始化为batch在开始附近会引发StopIteration)。