循环浏览多个CSV文件并产生多个输出

时间:2018-11-06 14:40:51

标签: python loops csv dataframe jupyter-notebook

我正在编写一些python脚本来打开.csv文件,定义数据框,运行一些分析(例如,汇总数据,拆分列,查找平均值等),并将分析的输出绘制在图形上。输出将是一个图形(.png文件)和一个csv文件,该文件的末尾添加了单词“ _ANALYSIS”。

我已经在Jupyter Notebook中将其设置为循环功能:

#import multiple csv files

import glob
import pandas as pd
import numpy as np
from pytz import all_timezones
import matplotlib.pyplot as plt

files = glob.glob('folder/*.csv')
for file in files:
    df = pd.read_csv(file)

    #START OF THE ANALYSIS
    #Multiple lines of code starts here


    #GRAPH some outputs from the analysis
    df2 = df.replace(0, np.nan)
    fig, ax = plt.subplots()
    df2.groupby('Day_type').plot(x = 'Time', y = 'avg_vt', ax=ax, grid=True)


    #OUTPUT FILES: graph + csv file
    plt.savefig('*.png', index = False)
    file_name="file"+str(i+1)+"_ANALYSIS"
    df.to_csv('file1_ANALYSIS.csv', index = False)

不幸的是,它没有产生任何输出。添加循环功能之前,我尝试过分析代码本身没有问题。

谢谢, R

1 个答案:

答案 0 :(得分:1)

pathlib稍微优雅

from pathlib import Path

folder="C:\Users\Renaldo.Moonu\Desktop\folder name"
for file in Path(folder).glob('*.csv'):
    df = pd.read_csv(file)
    df.fillna(0, inplace=True)
    fig, ax = plt.subplots()
    df.groupby('Day_type').plot(x = 'Time', y = 'avg_vt', ax=ax, grid=True)

    plt.savefig(file.with_suffix('.png'), index = False)
    df.to_csv(file.with_suffix('.csv'), index = False)