我有一个数据框,如下所示:
id created_at text month
0 911721027587231746 2017-09-23 22:36:46 تفاصيل استخدام سيارات الإسعاف لتهريب المواد ال... 9
1 911719688257851397 2017-09-23 22:31:27 تطوير لقاح جديد لمحاربة تسوس الأسنان\n https:/... 9
2 911715658395725826 2017-09-23 22:15:26 "حمدي الميرغني" يشارك جمهوره بصورة جديدة من شه... 9
3 911715466166587392 2017-09-23 22:14:40 شخصية مصر.. في عيون جمال حمدان (2) https://t.c... 9
月份列的值范围为1到11,我想根据月份的数量在文本数据上构建模型,我正在尝试获取输出并将其保存到txt文件但是当我打开文件我发现它每个只包含一行。
我想要的是获得每个索引命名的11个文本文件,每个文件应包含12行。
这是我的代码
def model(final_text):
sentences = [clean(raw_sentence) for raw_sentence in final_text]
doc_clean = [i.split() for i in sentences]
dictionary = corpora.Dictionary(doc_clean)
doc_term_matrix = [dictionary.doc2bow(doc) for doc in doc_clean]
Lda = gensim.models.ldamodel.LdaModel
ldamodel = Lda(doc_term_matrix, num_topics=12, id2word = dictionary, passes = 100, alpha='auto', update_every=5)
x = ldamodel.print_topics(num_topics=12, num_words=5)
y = ldamodel.show_topics(num_topics=12, num_words=5, formatted=False)
topics_words = [(tp[0], [wd[0] for wd in tp[1]]) for tp in y]
for topic,words in topics_words:
#print(" ".join(words).encode('utf-8'))
#print(words)
f = open(str(i)+'.txt', 'wb')
f.write(" ".join(words).encode('utf-8'))
#f.write(words.encode('utf-8'))
f.close()
#clean is just a function for cleaning data and it returns text
for i in range(1,12):
df = parsed[parsed['month'] == i]
text = df.text
model(text)
我在这里做错了什么?
提前致谢
答案 0 :(得分:0)
这是你的问题:
for topic,words in topics_words:
# print(" ".join(words).encode('utf-8'))
# print(words)
f = open(str(i)+'.txt', 'wb')
f.write(" ".join(words).encode('utf-8'))
# f.write(words.encode('utf-8'))
f.close()
您在完成循环后关闭文件,因此它只保存最后一个文件。将其更改为:
for topic,words in topics_words:
# print(" ".join(words).encode('utf-8'))
# print(words)
# f = open(str(i)+'.txt', 'wb')
with open(str(i) + '.txt', 'wb') as f:
f.write(" ".join(words).encode('utf-8'))
# f.write(words.encode('utf-8'))
# f.close()
使用"打开文件"打开文件写完后自动关闭它。
此外,"我"来自文件名," str(i)+' .txt'"?如果是来自外部" for循环"你应该将它作为参数添加到函数中。不要将它用作全局变量。
答案 1 :(得分:0)
with open(str(i)+'.txt', 'wb') as f:
for topic,words in topics_words:
f.write(" ".join(words).encode('utf-8'))
我首先打开文件并在里面运行循环,问题解决了