对于一个文件夹中的多个csv
文件,我希望循环所有以csv
结尾的文件并合并为一个excel文件,这里我举两个例子:
first.csv
date a b
0 2019.1 1.0 NaN
1 2019.2 NaN 2.0
2 2019.3 3.0 2.0
3 2019.4 3.0 NaN
second.csv
date c d
0 2019.1 1.0 NaN
1 2019.2 5.0 2.0
2 2019.3 3.0 7.0
3 2019.4 6.0 NaN
4 2019.5 NaN 10.0
...
我想要的输出是这样的,基于date
合并它们:
date a b c d
0 2019/1/31 1.0 NaN 1.0 NaN
1 2019/2/28 NaN 2.0 5.0 2.0
2 2019/3/31 3.0 2.0 3.0 7.0
3 2019/4/30 3.0 NaN 6.0 NaN
4 2019/5/31 NaN NaN NaN 10.0
我已经编辑了以下代码,但是显然date
转换和合并dfs
的某些部分是错误的:
import numpy as np
import pandas as pd
import glob
dfs = pd.DataFrame()
for file_name in glob.glob("*.csv"):
# print(file_name)
df = pd.read_csv(file_name, engine='python', skiprows=2, encoding='utf-8')
df = df.dropna()
df = df.dropna(axis = 1)
df['date'] = pd.to_datetime(df['date'], format='%Y.%m')
...
dfs = pd.merge(df1, df2, on = 'date', how= "outer")
# save the data frame
writer = pd.ExcelWriter('output.xlsx')
dfs.to_excel(writer,'sheet1')
writer.save()
请帮助我。谢谢。
答案 0 :(得分:1)
像这样尝试:
import numpy as np
import pandas as pd
import glob
from pandas.tseries.offsets import MonthEnd
dfs = pd.DataFrame()
for file_name in glob.glob("*.csv"):
df = pd.read_csv(file_name, engine='python', skiprows=2, encoding='utf-8')
df.columns = df.columns.str.lower().str.replace('dates', 'date')
df = df.dropna()
df = df.dropna(axis = 1)
df['date'] = pd.to_datetime(df['date'].astype(str), format='%Y.%m') + MonthEnd(1)
if dfs.empty:
dfs = df.copy()
else:
dfs = dfs.merge(df, on='date', how="outer")
答案 1 :(得分:1)
将concat
与DatetimeIndex
由参数read_csv
在index_col
中创建的parse_dates
和0
与dfs = []
for file_name in glob.glob("*.csv"):
df = pd.read_csv(file_name,
engine='python',
skiprows=2,
encoding='utf-8',
index_col=0,
parse_dates=[0])
#if necessary some processing
dfs.append(df)
df = pd.concat(dfs, axis=1)
df.index = df.index + pd.offsets.MonthEnd()
print (df)
a b c d
date
2019-01-31 1.0 NaN 1.0 NaN
2019-02-28 NaN 2.0 5.0 2.0
2019-03-31 3.0 2.0 3.0 7.0
2019-04-30 3.0 NaN 6.0 NaN
2019-05-31 NaN NaN NaN 10.0
一起用于第一列数据,最后添加最后一个每月最后一天以提高效果:
public void Execute(IJobExecutionContext context)
{
if (_isMaintenanceSystem)
{
// Delay job
// When delay, job fires and keep old scheduler as normal.
}
SendMail(_emailSetting, fileAttachment);
}