我将2年的每日数据拆分为月度文件。我想将所有这些数据合并到一个按日期和时间排序的文件中。我正在使用的代码组合了所有文件,但不是按顺序。
我正在使用的代码
import pandas as pd
import glob, os
import csv
inputdirectory = input('Enter the directory: ')
df_list = []
for filename in sorted(glob.glob(os.path.join(inputdirectory,"*.csv*"))):
df_list.append(pd.read_csv(filename))
full_df = pd.concat(df_list)
full_df.to_csv('totalsum.csv', index=False)
答案 0 :(得分:1)
预处理文件列表以对其进行排序:
import operator
fyles = ['CB02 May 2014.dailysum',
'CB01 Apr 2015.dailysum',
'CB01 Jul 2015.dailysum',
'CB01 May 2015.dailysum',
'CB01 Sep 2015.dailysum',
'CB01 Oct 2015.dailysum',
'CB13 May 2015.dailysum',
'CB01 Jun 2017.dailysum',
'CB01 Aug 2015.dailysum'
]
new_fyles = []
for entry in fyles:
day, month, year = entry.split()
year, _ = year.split('.')
day = day[-2:]
## print(entry, (month, year))
dt = datetime.datetime.strptime(' '.join((day, month, year)), '%d %b %Y')
## print(entry, dt)
new_fyles.append((entry, dt))
date = operator.itemgetter(1)
f_name = operator.itemgetter(0)
new_fyles.sort(key = date)
for entry in new_fyles:
print(f_name(entry))
您可以像这样制作文件列表:
import os, os.path
fyles = [fn for fn in os.listdir(inputdirectory) if fn.endswith('.dailysum')]
然后,在排序后,将每个文件的内容写入新文件:
with open('totalsum.csv', 'w') as out:
for entry in new_fyles:
f_path = os.path.join(inputdirectory, f_name(entry))
with open(f_path) as f:
out.write(f.read())
您可以在函数中执行排序
date = operator.itemgetter(1)
f_name = operator.itemgetter(0)
def f_name_sort(f_list):
'''Return sorted list of file names'''
new_fyles = []
for entry in f_list:
day, month, year = entry.split()
year, _ = year.split('.')
day = day[-2:]
dt = datetime.datetime.strptime(' '.join((day, month, year)), '%d %b %Y')
new_fyles.append((entry, dt))
new_fyles.sort(key = date)
return [f_name(entry) for f_name in new_fyles]
并像这样使用它:
for entry in f_name_sort(fyles):
...
或编写一个将文件名转换为日期时间对象并将其用作排序键的函数
def key(f_name):
day, month, year = f_name.split()
year, _ = year.split('.')
day = day[-2:]
return datetime.datetime.strptime(' '.join((day, month, year)), '%d %b %Y')
fyles.sort(key = key)
for entry in fyles:
...
答案 1 :(得分:1)
这一行之后:
full_df = pd.concat(df_list)
您需要将列'datecolumn'
转换为日期时间列:
full_df['datecolumn'] = full_df['datecolumn'].to_datetime(format=r'%d/%m/%y')
(根据您的评论判断,该格式应该有效)
最后你可以使用
full_df.sort_values(by='datecolumn').to_csv('totalsum.csv', index=False)
对其进行排序和编写