我是python的新手。我希望你能帮助我;)
我有一个数据:
[('Senators', 'NNS'), ('and', 'CC'), ('a', 'DT'), ('good', 'JJ'), ('teacher', 'NN'),
('believes', 'VBZ'), ('in', 'IN'), ('the', 'DT'), ('possibilities', 'NNS'), ('of',
'IN'),
('every', 'DT'), ('boy', 'NN'), ('and', 'CC'), ('girl', 'NN'), ('.', '.'),
('The','DT'),
('good', 'JJ'), ('teacher', 'NN'), ('sees', 'VBZ'), ('what', 'WP')......]
我想在列表中创建一个列表,其中包含每个句子,如下所示:
[ [('Senators', 'NNS'), ('and', 'CC'), ('a', 'DT'), ('good', 'JJ'), ('teacher', 'NN'),
('believes', 'VBZ'), ('in', 'IN'), ('the', 'DT'), ('possibilities', 'NNS'), ('of',
'IN'),
('every', 'DT'), ('boy', 'NN'), ('and', 'CC'), ('girl', 'NN')], [('The', 'DT'),
('good', 'JJ'), ('teacher', 'NN'), ('sees', 'VBZ'), ('what', 'WP')] ...]
但我不知道该怎么做:(我在for循环的帮助下尝试了它
for el in data:
if el[0] != ('.' or '?' or '!'): # finds only points((
sentences.append(el)
当他找到一个点时如何停止循环?如何让它更进一步并写入新列表?
答案 0 :(得分:2)
sentences = []
sentence = []
for word, code in data:
sentence.append((word, code))
if word in '.?!':
sentences.append(sentence)
sentence = []
如果您不希望将该点包含在句子中:
sentences = []
sentence = []
for word, code in data:
if word in '.?!':
sentences.append(sentence)
sentence = []
else:
sentence.append((word, code))
答案 1 :(得分:0)
您可以使用生成器:
def per_sentence(qualified):
sentence = []
for word, class_ in qualified:
sentence.append((word, class_))
if class_ == '.':
yield sentence
sentence = []
if sentence:
# yield tail
yield sentence
然后生成一个包含list()
的列表:
sentences = list(per_sentence(data))
演示:
>>> data = [('Senators', 'NNS'), ('and', 'CC'), ('a', 'DT'), ('good', 'JJ'), ('teacher', 'NN'),
... ('believes', 'VBZ'), ('in', 'IN'), ('the', 'DT'), ('possibilities', 'NNS'), ('of',
... 'IN'),
... ('every', 'DT'), ('boy', 'NN'), ('and', 'CC'), ('girl', 'NN'), ('.', '.'),
... ('The','DT'),
... ('good', 'JJ'), ('teacher', 'NN'), ('sees', 'VBZ'), ('what', 'WP')]
>>> def per_sentence(qualified):
... sentence = []
... for word, class_ in qualified:
... sentence.append((word, class_))
... if class_ == '.':
... yield sentence
... sentence = []
... if sentence:
... # yield tail
... yield sentence
...
>>> list(per_sentence(data))
[[('Senators', 'NNS'), ('and', 'CC'), ('a', 'DT'), ('good', 'JJ'), ('teacher', 'NN'), ('believes', 'VBZ'), ('in', 'IN'), ('the', 'DT'), ('possibilities', 'NNS'), ('of', 'IN'), ('every', 'DT'), ('boy', 'NN'), ('and', 'CC'), ('girl', 'NN'), ('.', '.')], [('The', 'DT'), ('good', 'JJ'), ('teacher', 'NN'), ('sees', 'VBZ'), ('what', 'WP')]]
答案 2 :(得分:0)
假设data
的每个元组中的第二个元素是单词的类,对于所有句子终结符将是.
,尝试类似:
sentences = [[],]
for word in data:
sentences[-1].append(word)
if word[1] == '.':
sentences.append([])
如果最后一句话被sentences
- 类元素正确终止,这将导致.
在末尾包含一个空列表。