Question

我有一个代码：在()之前对相同的单词进行分组，并在()中组合并删除冗余数据。它还会从一个'（）'到下一个{{ 1}}和groups / add使用相同的名称：

例如：`输入文本文件

()

通过保留缩进的预期输出为：

 the cars and computers... 
 Car(ferrari,lamborghini,porsche)
  some manufacturers specialise in "super" cars.
Most people like them.

 Computer(hp,dell,apple,sony,fujitsu)
  These are some laptop manufacturers
car(skoda,audi)
     GOOD cars

我已经完成了一个组合（）并删除冗余数据的代码，但它没有对行进行分组并在（）之前添加到同一个单词而不删除添加的文本。

我的代码：

the cars and computers...
 Car(ferrari,lamborghini,porsche,skoda,audi)
  some manufacturers specialise in "super" cars.
Most people like them.
    GOOD cars

 Computer(hp,dell,apple,sony,fujitsu)
  These are some laptop manufacturers

请帮我修改我的代码！将不胜感激！

Answer 1

不确定您的代码有什么问题。看起来你实际上从未在组中添加文字...

无论如何，您可以将数据聚合部分简化为：

import re
import collections
with open('texta.txt', "r+") as f:
    p = re.compile(r'^(\S+)\((.*)\)$')
    group_ids = collections.OrderedDict()   # group -> set of ids (?)
    group_words = collections.OrderedDict() # group -> list of words
    group = None                            # last group, or None
    for line in f:
        match = p.match(line)
        if match:
            group, ids = match.groups()
            group_ids.setdefault(group, set()).update(ids.split(','))
        elif line.strip() and group:
            group_words.setdefault(group, []).append(line.rstrip())

在此之后，group_ids和group_words将是

{'car': set(['ab', 'ef', 'ad', 'cd']), 'bike': set(['ac', 'de'])}
{'car': ['go', 'drive', 'enjoy'], 'bike': ['ride']}

以所需格式将文件写入文件应该不是什么大问题，例如：

with open('textb.txt', 'w') as f:
    for group, ids in group_ids.items():
        f.write("%s(%s)\n" % (group, ','.join(ids)))
        for word in group_words[group]:
            f.write(word + '\n')
        f.write('\n')

这将产生此输出：（bla是第二个car块之后的另一个块，用于测试）

car(ab,ef,ad,cd)
 go
 drive
 enjoy

bike(ac,de)
    ride

bla(xx)
 blub

或者，如果您更喜欢使用r+模式，请确保首先f.seek(0) 和然后 f.truncate()，否则旧数据不会被完全删除。

python grouping（）并为单词添加文本

1 个答案: