如何根据标签对句子进行分组?

时间:2016-03-08 10:07:07

标签: python

如果我在一个文件中设置了句子,例如:

1 let's go shopping
1 what a wonderful day
1 let's party tonight
2 nobody went there
2 it was a deserted place
3 lets go tomorrow
4 what tomorrow
4 ok sure let's see

我想将这些句子分组。就像属于标签'1'的所有句子应该在一个组中,而在'2'中的那些句子应该在另一组中。

所以我正在加载这样的文件:

result=[]
with open("sentences.txt","r") as filer:
    for line in filer:
        result.append(line.strip().split())

所以我得到这样的东西:

[['1', 'let's',  'go',  'shopping'], 
['1', 'what',  'a',  'wonderful',  'day'],
['1', 'let's', 'party', 'tonight'],
['2', 'nobody', 'went', 'there']]

现在我想要这样的事情:

for line in result:
    if line[0]== '1':
        process(line)
    elif line[0]=='2':
        process(line)
    elif line[0]=='4':
        process(line)
    elif line[0]=='3':
        process(line)

但问题在于它一次只考虑一个句子。我希望组中的所有'1'然后在它们上运行进程(函数)。

文件1:

[['1', 'in', 'seattle', 'today', 'the', 'secretary', 'of', 'education', 'richard', 'riley', 'delivered', 'his', 'address', 'on', 'the', 'state', 'of', 'american', 'education'], ['1', 'one', 'of', 'the', 'things', 'he', 'focused', 'on', 'as', 'the', 'president', 'had', 'done', 'in', 'his', 'state', 'of', 'the', 'union', 'was', 'the', 'goal', 'to', 'reduce', 'the', 'size', 'of', 'the', 'average', 'class']]

文件2:

[['1', 'in', 'seattl', 'today', 'the', 'secretari', 'of', 'educ', 'richard', 'riley', 'deliv', 'hi', 'address', 'on', 'the', 'state', 'of', 'american', 'educ'], ['1', 'one', 'of', 'the', 'thing', 'he', 'focus', 'on', 'a', 'the', 'presid', 'had', 'done', 'in', 'hi', 'state', 'of', 'the', 'union', 'wa', 'the', 'goal', 'to', 'reduc', 'the', 'size', 'of', 'the', 'averag', 'class']]

1 个答案:

答案 0 :(得分:5)

from collections import defaultdict

result = defaultdict(list)
with open("sentences.txt","r") as filer:
    for line in filer:
        label, sentence = line.strip().split(' ', 1)
        result[label].append(sentence)

然后你可以用:

来处理它
for label, sentences in result.items():
    # bla bla bla