如何读取以查找匹配字符串并将结果拆分为多个文件

时间:2018-04-26 16:39:45

标签: python readline strip

这是一个场景,我需要逐行读取模式文件。

模式文件的内容与此

有些相似
chicken 
chicken
chicken
chicken
## comment
## comment
fish
fish
chicken
chicken
chicken

到目前为止我提出的代码是这样的。

def readlines_write():
    with open(filename) as rl:
        for line in rl:
            if "chicken" in line:
                with open(new_filename, 'a+') as new_rl:
                    new_rl.write(line)

使用上面的代码,我可以在该模式文件中找到所有“鸡”,结果将写入new_filename。但这不是目标。因为我把它们总结在一个文件中。

我想将鸡肉分开并将其写入多个文件。

EG。最后的结果应该是,连续读取,如果发现鸡肉,下一行不含鸡肉时停止。打破并将其写入文件,例如a.out。

脚本继续逐行阅读,并在“评论”和“鱼”之后找到下一个匹配项。并将结果写入b.out

我心中有伪,但我不确定如何将其转换为python逻辑。

总结,我想把由评论和其他词语分开的鸡肉分开而不是鸡肉。

2 个答案:

答案 0 :(得分:2)

So, what you're looking for is contiguous groups of chicken lines, and you want to put each group into it a separate file. Fine, batteries are included.

import itertools

def is_chicken(x):
    return 'chicken' in x # Can add more complex logic.

def write_groups(input_sequence):
    count = 1
    grouper = itertools.groupby(input_sequence, is_chicken)
    for found, group in grouper:
        # The value of `found` here is what `is_chicken` returned;
        # we only want groups where it returned true.
        if found:
            with open('file-%d.chicken' % count, 'w') as f:
                f.writelines(group)
            count += 1

Now you can

with open('input_file') as input_file:
    write_groups(input_file)

The same thing can be done in a more functionally-decomposed way, though a bit harder to understand in you're not used to generators:

def get_groups(input_sequence):
    grouper = itertools.groupby(input_sequence, is_chicken)
    # Return a generator producing only the groups we want.
    return (group for (found, group) in grouper if found)


with open('input_file') as input_file:
    for (count, group) in enumerate(get_groups(input_file), start=1):
        with open('file-%d.chicken' % count, 'w') as f:
            f.writelines(group)

答案 1 :(得分:1)

只需添加else条件,并按整数或时间戳更改文件名。

def readlines_write():
        i = 0
        new_filename = 'filename{}.out'.format(i)
        with open(filename) as rl:
            for line in rl:
                if "chicken" in line:
                    with open(new_filename, 'a+') as new_rl:
                        new_rl.write(line)
                else:
                    i +=1
                    new_filename = 'filename{}.out'.format(i)