在输出文件中使用date作为索引

时间:2017-09-18 07:55:17

标签: python excel python-3.x pandas

我有几个excel文件,其文件名由不同的日期区分。我必须连接所有这些文件,其文件名日期作为索引列。我在下面写了以下代码:

path = r"C:\\Users\\atcs\\Desktop\\data science\\files\\1-Danny Jones KPI's\\Source\\"                     
fileName =  glob.glob(os.path.join(path, "*.xlsx"))
df = (pd.read_excel(f, header=None, sheetname = "YTD Summary_4") for f in fileName)
k = (re.search("([0-9]{1,2}\-[0-9]{1,2}\-[0-9]{4})", fileName))
concatenated_df   = pd.concat(df, index=k)
concatenated_df.to_csv('tableau7.csv')

我在这里做的是首先定义一个目录然后将包含xlsx文件的所有文件分配给filename。我在datadrame中定义了filename,使用正则表达式从filename获取日期并将其分配给变量k。现在我连接文件以获取输出csv文件。但代码以某种方式给出了错误: TypeError:期望字符串或类字节对象。有人可以帮助我做错了吗。

2 个答案:

答案 0 :(得分:1)

您可以使用:

#simplify for add *.xlsx to path
path = r"C:\\Users\\atcs\\Desktop\\data science\\files\\1-Danny Jones KPI's\\Source\\*.xlsx"
fileName =  glob.glob(path)
#create list of DataFrames dfs
dfs = [pd.read_excel(f, header=None, sheetname = "YTD Summary_4") for f in fileName]
#add parameter keys for filenames, remove second level of multiindex
concatenated_df = pd.concat(dfs, keys=fileName).reset_index(level=1, drop=True)
#extract dates and convert to DatetimeIndex
pat = '([0-9]{1,2}\-[0-9]{1,2}\-[0-9]{4})'
concatenated_df.index = pd.to_datetime(concatenated_df.index.str.extract(pat, expand=False))
print (concatenated_df)

答案 1 :(得分:0)

一点点mod,

path = r"C:\\Users\\atcs\\Desktop\\data science\\files\\1-Danny Jones KPI's\\Source\\*.xlsx"                     
fileName =  glob.glob(path)
l = []
for f in fileName:
    df = pd.read_excel(f, header=None, sheetname = "YTD Summary_4")
    df['date'] = f
    l.append(df)
concatenated_df   = pd.concat(l).set_index('date')
concatenated_df.to_csv('tableau7.csv')