我目前正在浏览 Gensim 关于 Corpora和向量空间的教程,因为我现在正在尝试理解Corpus Streaming – One Document at a Time。
在编译了这些代码行之后,请参考python3中的上述链接:
class MyCorpus(object):
def __iter__(self):
for line in open('mycorpus.txt'):
# assume there's one document per line, tokens separated by whitespace
yield dictionary.doc2bow(line.lower().split())
corpus_memory_friendly = MyCorpus() # doesn't load the corpus into memory!
print(corpus_memory_friendly)
for vector in corpus_memory_friendly: # load one vector into memory at a time
print(vector)
我收到此错误:
<__main__.MyCorpus object at 0x7f2e37e17d68>
Traceback (most recent call last):
File "<pyshell#46>", line 1, in <module>
for vector in corpus_memory_friendly: # load one vector into memory at a time
File "<pyshell#41>", line 3, in __iter__
for line in open('mycorpus.txt'):
FileNotFoundError: [Errno 2] No such file or directory: 'mycorpus.txt'
我已经下载 mycorpus.txt ,但我收到此错误。我应该在哪里存储mycorpus.txt文件。
感谢您的帮助。