我正在尝试运行此代码:
with open(Textfile.txt', 'r') as text1:
raw_text = text1.read().lower()
import re
from nltk.util import ngrams
raw_text = re.sub(r'[^a-zA-Z0-9\s]', ' ', raw_text)
tokens = [token for token in raw_text.split(" ") if token != ""]
# generate ngrams
output = list(ngrams(tokens, 2))
return
我收到以下错误:
StopIteration Traceback (most recent call last)
~\AppData\Local\Continuum\anaconda3\lib\site-packages\nltk\util.py in ngrams(sequence, n, pad_left, pad_right, left_pad_symbol, right_pad_symbol)
467 while n > 1:
--> 468 history.append(next(sequence))
469 n -= 1
StopIteration:
The above exception was the direct cause of the following exception:
RuntimeError Traceback (most recent call last)
<ipython-input-11-2ce960ed385c> in <module>()
7 tokens = [token for token in raw_text.split(" ") if token != ""]
8 # generate ngrams
----> 9 output = list(ngrams(tokens, 2))
10 try:
11 yield next(seq)
RuntimeError: generator raised StopIteration
我的问题是如何在不出现此错误的情况下将ngrams of 2应用于任何Textfile? 如果你们能帮助我解决这个问题,那就太好了:)