如何迭代地将输出数据写入文本文件?

时间:2017-10-05 01:17:41

标签: python string pandas encoding gensim

我有一个数据框,如下所示:

id  created_at  text    month

0   911721027587231746  2017-09-23 22:36:46 تفاصيل استخدام سيارات الإسعاف لتهريب المواد ال...   9
1   911719688257851397  2017-09-23 22:31:27 تطوير لقاح جديد لمحاربة تسوس الأسنان\n https:/...   9
2   911715658395725826  2017-09-23 22:15:26 "حمدي الميرغني" يشارك جمهوره بصورة جديدة من شه...   9
3   911715466166587392  2017-09-23 22:14:40 شخصية مصر.. في عيون جمال حمدان (2) https://t.c...   9

月份列的值范围为1到11,我想根据月份的数量在文本数据上构建模型,我正在尝试获取输出并将其保存到txt文件但是当我打开文件我发现它每个只包含一行。

我想要的是获得每个索引命名的11个文本文件,每个文件应包含12行。

这是我的代码

def model(final_text):

    sentences = [clean(raw_sentence) for raw_sentence in final_text]
    doc_clean = [i.split() for i in sentences]
    dictionary = corpora.Dictionary(doc_clean)
    doc_term_matrix = [dictionary.doc2bow(doc) for doc in doc_clean]
    Lda = gensim.models.ldamodel.LdaModel
    ldamodel = Lda(doc_term_matrix, num_topics=12, id2word = dictionary, passes = 100, alpha='auto', update_every=5)
    x = ldamodel.print_topics(num_topics=12, num_words=5)

    y = ldamodel.show_topics(num_topics=12, num_words=5, formatted=False)
    topics_words = [(tp[0], [wd[0] for wd in tp[1]]) for tp in y]
    for topic,words in topics_words:
        #print(" ".join(words).encode('utf-8'))
        #print(words)

        f = open(str(i)+'.txt', 'wb')
        f.write(" ".join(words).encode('utf-8'))
        #f.write(words.encode('utf-8'))
    f.close()

#clean is just a function for cleaning data and it returns text

for i in range(1,12):
    df = parsed[parsed['month'] == i]
    text = df.text
    model(text)

我在这里做错了什么?

提前致谢

2 个答案:

答案 0 :(得分:0)

这是你的问题:

for topic,words in topics_words:
    # print(" ".join(words).encode('utf-8'))
    # print(words)

    f = open(str(i)+'.txt', 'wb')
    f.write(" ".join(words).encode('utf-8'))
    # f.write(words.encode('utf-8'))
f.close()

您在完成循环后关闭文件,因此它只保存最后一个文件。将其更改为:

for topic,words in topics_words:
    # print(" ".join(words).encode('utf-8'))
    # print(words)

    # f = open(str(i)+'.txt', 'wb')
    with open(str(i) + '.txt', 'wb') as f:
        f.write(" ".join(words).encode('utf-8'))
    # f.write(words.encode('utf-8'))
# f.close()

使用"打开文件"打开文件写完后自动关闭它。

此外,"我"来自文件名," str(i)+' .txt'"?如果是来自外部" for循环"你应该将它作为参数添加到函数中。不要将它用作全局变量。

答案 1 :(得分:0)

with open(str(i)+'.txt', 'wb') as f:
    for topic,words in topics_words:
        f.write(" ".join(words).encode('utf-8'))

我首先打开文件并在里面运行循环,问题解决了