Question

我是python的新手。我希望你能帮助我;）

我有一个数据：

[('Senators', 'NNS'), ('and', 'CC'), ('a', 'DT'), ('good', 'JJ'), ('teacher', 'NN'),
  ('believes', 'VBZ'), ('in', 'IN'), ('the', 'DT'), ('possibilities', 'NNS'), ('of',
   'IN'), 
  ('every', 'DT'), ('boy', 'NN'), ('and', 'CC'), ('girl', 'NN'), ('.', '.'),
  ('The','DT'), 
  ('good', 'JJ'), ('teacher', 'NN'), ('sees', 'VBZ'), ('what', 'WP')......]

我想在列表中创建一个列表，其中包含每个句子，如下所示：

[ [('Senators', 'NNS'), ('and', 'CC'), ('a', 'DT'), ('good', 'JJ'), ('teacher', 'NN'),
  ('believes', 'VBZ'), ('in', 'IN'), ('the', 'DT'), ('possibilities', 'NNS'), ('of',
   'IN'), 
  ('every', 'DT'), ('boy', 'NN'), ('and', 'CC'), ('girl', 'NN')], [('The', 'DT'),
  ('good', 'JJ'), ('teacher', 'NN'), ('sees', 'VBZ'), ('what', 'WP')] ...]

但我不知道该怎么做:(我在for循环的帮助下尝试了它

  for el in data:
     if el[0] != ('.' or '?' or '!'):   # finds only points((
     sentences.append(el)

当他找到一个点时如何停止循环？如何让它更进一步并写入新列表？

Answer 1

sentences = []
sentence = []

for word, code in data:
    sentence.append((word, code))
    if word in '.?!':
        sentences.append(sentence)
        sentence = []

如果您不希望将该点包含在句子中：

sentences = []
sentence = []

for word, code in data:
    if word in '.?!':
        sentences.append(sentence)
        sentence = []
    else:
        sentence.append((word, code))

Answer 2

您可以使用生成器：

def per_sentence(qualified):
    sentence = []
    for word, class_ in qualified:
        sentence.append((word, class_))
        if class_ == '.':
            yield sentence
            sentence = []
    if sentence:
        # yield tail
        yield sentence

然后生成一个包含list()的列表：

sentences = list(per_sentence(data))

演示：

>>> data = [('Senators', 'NNS'), ('and', 'CC'), ('a', 'DT'), ('good', 'JJ'), ('teacher', 'NN'),
...   ('believes', 'VBZ'), ('in', 'IN'), ('the', 'DT'), ('possibilities', 'NNS'), ('of',
...    'IN'), 
...   ('every', 'DT'), ('boy', 'NN'), ('and', 'CC'), ('girl', 'NN'), ('.', '.'),
...   ('The','DT'), 
...   ('good', 'JJ'), ('teacher', 'NN'), ('sees', 'VBZ'), ('what', 'WP')]
>>> def per_sentence(qualified):
...     sentence = []
...     for word, class_ in qualified:
...         sentence.append((word, class_))
...         if class_ == '.':
...             yield sentence
...             sentence = []
...     if sentence:
...         # yield tail
...         yield sentence
... 
>>> list(per_sentence(data))
[[('Senators', 'NNS'), ('and', 'CC'), ('a', 'DT'), ('good', 'JJ'), ('teacher', 'NN'), ('believes', 'VBZ'), ('in', 'IN'), ('the', 'DT'), ('possibilities', 'NNS'), ('of', 'IN'), ('every', 'DT'), ('boy', 'NN'), ('and', 'CC'), ('girl', 'NN'), ('.', '.')], [('The', 'DT'), ('good', 'JJ'), ('teacher', 'NN'), ('sees', 'VBZ'), ('what', 'WP')]]

Answer 3

假设data的每个元组中的第二个元素是单词的类，对于所有句子终结符将是.，尝试类似：

sentences = [[],]
for word in data:
    sentences[-1].append(word)
    if word[1] == '.':
        sentences.append([])

如果最后一句话被sentences - 类元素正确终止，这将导致.在末尾包含一个空列表。

如何停止for循环并使其进一步工作？

3 个答案: