我想将所有迭代从终端输出打印到.csv
文件中。
基本上,我的代码从文件夹中的几个文件读取并提取带有熊猫的数据。
当仅打印.csv
时,如果只打印最后一个文档,则希望将所有数据打印或将所有数据连接到一个.csv
中,这似乎对我没有任何帮助。
我该怎么做?
import nltk
import collections
from collections import Counter
import pandas as pd
import os
import re
import glob
from nltk.corpus import stopwords
import csv
class Extract_Pandas_File_Dataframe():
top_N = 1
path = r'./docs'
filenames = glob.glob(path + "/*.txt")
# Loop through lines in differents documents
for lines in filenames:
# Read dataframe without spaces and added to column Sentences
df_main = pd.read_csv(lines, sep="\t", names=[
"Sentences"])
# Puntuation is not working properly, room for improvement!
# Bug: doesnt avoid "'" among others
#stopwords = nltk.corpus.stopwords.words('english')
#RE_stopwords = r'\b(?:{})\b'.format('|'.join(stopwords))
# Grab words from df column SEnteces
words = (df_main.Sentences
.str.lower()
# .replace([r'\|', RE_stopwords], [' ', ''], regex=True)# execution of RE_stopwords
.str.cat(sep=' ')
.split()
)
# Result dataframe words and count
rslt = pd.DataFrame(Counter(words).most_common(top_N),
columns=['Word', 'Frequency'])
# Filename dataframe documents name
df_filenames = pd.DataFrame({'Documents': [lines]})
df_filenames.dropna()
# Build the main dataframe
# In this case we are going to concatenate all dfs in one.
df_build = pd.concat([rslt, df_main, df_filenames])
# Set the Index of the df to Word as is teh main relation of the database
df_build.set_index('Word', inplace=True)
df_build.dropna()
print(df_build)
# Write to document
df_build.to_csv(r'EidgenDataframe.csv', index=False)
# Call class
Extract_Pandas_File_Dataframe()
在终端中有2次迭代,但是只有最后一次迭代保存到.csv文件中。