Question

我想将所有迭代从终端输出打印到.csv文件中。基本上，我的代码从文件夹中的几个文件读取并提取带有熊猫的数据。当仅打印.csv时，如果只打印最后一个文档，则希望将所有数据打印或将所有数据连接到一个.csv中，这似乎对我没有任何帮助。

我该怎么做？

import nltk
import collections
from collections import Counter
import pandas as pd
import os
import re
import glob
from nltk.corpus import stopwords
import csv


class Extract_Pandas_File_Dataframe():

    top_N = 1
    path = r'./docs'

    filenames = glob.glob(path + "/*.txt")

    # Loop through lines in differents documents
    for lines in filenames:

        # Read dataframe without spaces and added to column Sentences
        df_main = pd.read_csv(lines, sep="\t", names=[
            "Sentences"])

        # Puntuation is not working properly, room for improvement!
        # Bug: doesnt avoid "'" among others
        #stopwords = nltk.corpus.stopwords.words('english')
        #RE_stopwords = r'\b(?:{})\b'.format('|'.join(stopwords))

        # Grab words from df column SEnteces
        words = (df_main.Sentences
                 .str.lower()
                 # .replace([r'\|', RE_stopwords], [' ', ''], regex=True)# execution of RE_stopwords
                 .str.cat(sep=' ')
                 .split()
                 )

        # Result dataframe words and count
        rslt = pd.DataFrame(Counter(words).most_common(top_N),
                            columns=['Word', 'Frequency'])

        # Filename dataframe documents name
        df_filenames = pd.DataFrame({'Documents': [lines]})
        df_filenames.dropna()

        # Build the main dataframe
        # In this case we are going to concatenate all dfs in one.
        df_build = pd.concat([rslt, df_main,  df_filenames])
        # Set the Index of the df to Word as is teh main relation of the database
        df_build.set_index('Word', inplace=True)

        df_build.dropna()
        print(df_build)

        # Write to document
        df_build.to_csv(r'EidgenDataframe.csv', index=False)


# Call class
Extract_Pandas_File_Dataframe()

在终端中有2次迭代，但是只有最后一次迭代保存到.csv文件中。

终端输出数据框迭代到.csv文件

0 个答案: