我有一个包含多个csv文件的文件夹,其中包含库存数据。我想将所有文件读入数据帧并删除我不需要的数据,然后将其余文件合并到一个数据帧中。我写了一些有用的代码,但它做得不好,有一些我想跳过的中间步骤使它更有效。这是我现在使用的代码。
import pandas as pd
import os
import glob
my_dir = "Path where the csv files is stored" #Directory containing the InFront data files
my_dir2 = "path where I save the csv's after droping some columns, and the final file." #
#Reads inn the csv file names into a list
filelist = []
os.chdir(my_dir)
for files in glob.glob("*.csv"):
filelist.append(files)
#Open csv files from list and droppes everything but close price from the data frames
for string in filelist:
path = "%s\\%s" % (my_dir, string)
frame = pd.read_csv(path, index_col=[1], parse_dates=True)
frame = frame.drop('<TIME>', 1)
frame = frame.drop('<OPEN>', 1)
frame = frame.drop('<HIGH>', 1)
frame = frame.drop('<LOW>', 1)
frame = frame.drop('<VOL>', 1)
frame.index.names = ['Date']
ticker = frame['<TICKER>'].ix[1]
frame.rename(columns = {'<CLOSE>' : ticker}, inplace=True)
frame.drop(frame.columns[0], 1, inplace=True)
frame.sort_index(ascending=False, inplace=True)
#Saves the files to the folder specified as my_dir2
frame.to_csv('%s\\new %s' % (my_dir2, string))
filelist = []
os.chdir(my_dir2)
for files in glob.glob("*.csv"):
filelist.append(files)
df_list = [pd.read_csv(file, index_col='Date', parse_dates=True) for file in filelist]
big_df = pd.concat(df_list, axis=1)
big_df.sort_index(ascending=False, inplace=True)
big_df.to_csv('data.csv')
正如您所看到的,我已经分两步完成了这项工作,我还需要保存第一个结果。我必须有一个简单的方法来做到这一点,我希望有人可以帮助我。