我从LDA主题模型程序中得到了一些不规则的行为,现在,看来我的文件不会保存它创建的lda模型...我真的不确定为什么。
这是一个代码段,尽管要花更多的时间才能编写可复制的代码,因为我实际上只是在尝试加载预先创建的某些文件。
def naive_LDA_implementation(name_of_lda, create_dict=False, remove_low_freq=False):
LDA_MODEL_PATH = "lda_dir/" + str(name_of_lda) +"/model_dir/" # for some reason this location doesn't work entirely... and yes, I have made a directory in a the folder of this name.
# This ends up saving the .state, .id2word, and .expEblogbeta.npy files... But normally when saving an lda model actually works, a fourth file is included that's to my understanding the model itself.
# LDA_MODEL_PATH = "models/" # This is what I originally had as the location for LDA_MODEL_PATH. I was using a directory called models for multiple lda models. This no longer works.
doc_df = getCorpus(name_of_lda, cleaned=True) # returns a dataframe containing a row for each text record and an extra column that contains the tokenized version of the text's post/string of words.
dict_path = "lda_dir/" + str(name_of_lda) + "/dict_of_tokens.dict"
docs_of_tokens = convert_cleaned_tokens_entries(doc_df['cleaned_tokens'])
if create_dict != False:
doc_dict = corpora.Dictionary(docs_of_tokens) :
if remove_low_freq==True:
doc_dict.filter_extremes(no_below=5, no_above=0.6)
doc_dict.save(dict_path)
print("Finished saving")
else:
doc_dict = corpora.Dictionary.load(dict_path)
doc_term_matrix = [doc_dict.doc2bow(doc) for doc in docs_of_tokens] # gives a unique id for each word in corpus_arr
Lda = gensim.models.ldamodel.LdaModel
ldamodel = Lda(doc_term_matrix, num_topics=15, id2word = doc_dict, passes=20, chunksize=10000)
ldamodel.save(LDA_MODEL_PATH)
简而言之...当我尝试将lda模型保存到特定位置时,我不知道为什么权限被拒绝。现在,即使是原始的models/
目录位置也给我此错误消息“拒绝权限”。似乎我可以使用的所有目录都无法使用。这是奇怪的行为,我找不到在相同上下文中谈论此错误的询问。我发现有人实际上尝试将其存储在不存在的位置时收到此错误消息。但是对我来说,这并不是一个真正的问题。
当我第一次遇到此错误时,我实际上开始怀疑这是否是因为我有另一个lda主题模型,我将其命名为topic_model_1。它存储在models/
子目录中。我开始怀疑这个名称是否是一个潜在的原因,然后将其更改为lda_model_topic_1
以查看是否可以更改结果……但没有任何效果。
即使您不能真正弄清楚哪种解决方案适用于我的情况(尤其是由于目前我没有可重复的代码,我也只有我的工作)...有人可以告诉我此错误消息的含义吗?什么时候以及为什么出现?也许这是一个开始。
Traceback (most recent call last):
File "C:\Users\biney\Miniconda3\lib\site-packages\gensim\utils.py", line 679,
in save
_pickle.dump(self, fname_or_handle, protocol=pickle_protocol)
TypeError: file must have a 'write' attribute
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "text_mining.py", line 461, in <module>
main()
File "text_mining.py", line 453, in main
naive_LDA_implementation(name_of_lda="lda_model_topic_1", create_dict=True,
remove_low_freq=True)
File "text_mining.py", line 411, in naive_LDA_implementation
ldamodel.save(LDA_MODEL_PATH)
File "C:\Users\biney\Miniconda3\lib\site-packages\gensim\models\ldamodel.py",
line 1583, in save
super(LdaModel, self).save(fname, ignore=ignore, separately=separately, *arg
s, **kwargs)
File "C:\Users\biney\Miniconda3\lib\site-packages\gensim\utils.py", line 682,
in save
self._smart_save(fname_or_handle, separately, sep_limit, ignore, pickle_prot
ocol=pickle_protocol)
File "C:\Users\biney\Miniconda3\lib\site-packages\gensim\utils.py", line 538,
in _smart_save
pickle(self, fname, protocol=pickle_protocol)
File "C:\Users\biney\Miniconda3\lib\site-packages\gensim\utils.py", line 1337,
in pickle
with smart_open(fname, 'wb') as fout: # 'b' for binary, needed on Windows
File "C:\Users\biney\Miniconda3\lib\site-packages\smart_open\smart_open_lib.py
", line 181, in smart_open
fobj = _shortcut_open(uri, mode, **kw)
File "C:\Users\biney\Miniconda3\lib\site-packages\smart_open\smart_open_lib.py
", line 287, in _shortcut_open
return io.open(parsed_uri.uri_path, mode, **open_kwargs)
PermissionError: [Errno 13] Permission denied: 'lda_dir/lda_model_topic_1/model_
dir/'
答案 0 :(得分:0)
似乎是因为您使用的是相对路径,所以您可能试图将其保存到SCRIPT_LAUNCH_PATH + lda_dir/lda_model_topic_1/model_dir/
的位置,该位置不可写(可能是SCRIPT_LAUNCH_PATH
实际上是您的{{1 }}-python解释器的安装目录。
您可以check your launch directory:
PYTHONPATH
或(更好)将文件保存到绝对路径,例如:import os
print(os.path.dirname(os.path.abspath(__file__)))
(在Windows中,请记住将C:\Users\<youruser>\Documents\...
交换为您的登录名),您应该在其中拥有所有写许可权。
另一个原因可能是您使用与创建目录不同的用户来运行脚本。