Question

我正在尝试连接不同的excel文件 - 相同的列，每个文件来自不同的类别 - 使用第一个文件中的日期（作为索引）。

excel文件格式基本上是Date列（19.09.2014）和其他具有float的列。

import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import os
import seaborn as sns

country = ["Brazil", "Chile", "Colombia"]

    PD = pd.read_excel("Brazil.xls", parse_dates = True , format = '%dd%.mm.%YYYY', index_col = [0], skiprows=1 )
    PD = pd.DataFrame( PD[str("Ann.PD-" + rec)]) 
    PD.columns = ['Brazil']
    print(PD.head())

一旦我对文件中的一列感兴趣，输出如下：

    Brazil
Date              
2014-09-19     2.2
2014-09-22     2.5
2014-09-23     2.4
2014-09-24     2.4
2014-09-25     2.5

从现在开始，我想使用第一个国家/地区文件（巴西）的日期作为参与其他文件的参考日期。因此，我需要遍历列表中其余国家/地区的其他文件。迭代如下：

for ct in country[1:]:
    b = pd.read_excel(str(ct + ".xls"), parse_dates = True, format = '%dd%.mm.%YYYY', index_col = [0], skiprows=1,)
    b = pd.DataFrame(b[ str("Ann.PD-" + rec) ] )
    b.columns = [ct]
    PD = pd.concat([PD, b], axis = 1 )
print(PD.head(3))

            Brazil  Chile  Colombia  Mexico  Panama  Peru  Venezuela
Date                                                                
2014-01-10     2.7    1.3       1.6     1.4     1.6   1.7       15.3
2014-01-12     2.5    1.2       1.7     1.4     1.5   1.7       18.3
2014-02-10     2.7    1.3       1.6     1.4     1.5   1.7       15.4

如您所见，虽然所有文件都具有相同的日期，但日期会发生变化。任何人都知道如何保持日期既是内部联接的索引又是密钥？

我希望得到以下输出：

            Brazil  Chile  Colombia  Mexico  Panama  Peru  Venezuela
Date                                                                
2014-09-19     2.7    1.3       1.6     1.4     1.6   1.7       15.3
2014-09-22     2.5    1.2       1.7     1.4     1.5   1.7       18.3
2014-09-23     2.7    1.3       1.6     1.4     1.5   1.7       15.4

Answer 1

如果没有示例数据框，很难确定问题，但我认为SlackClient操作会覆盖索引。

使用pd.concat

时，通常应该读取所有文件，然后将它们连接起来，这样更快，更有效（在pd.concat documentation中有描述）

试试这个

将所有excel文件读入字典：

pd.concat

然后将它们连接起来：

dat_dict = {}

for ct in country:
    b = pd.read_excel(str(ct + ".xls"), parse_dates = True, format = '%dd%.mm.%YYYY', index_col = [0], skiprows=1,)

    #I am not sure what b is but if you can store it as a pd.Series in the data_dict
    #instead of pd.Dataframe then you won't need b.iloc[:,0] later on.

    b = pd.DataFrame(b[ str("Ann.PD-" + rec) ] )
    #b.columns = [ct]
    dat_dict[ct] = b.iloc[:,0]

stacked_df = pd.concat(dat_dict)应该是一个多级数据框，其中国家/地区为0级，日期为1级。因此，您可以使用stacked_df操作来获取所需的输出。

unstack

考虑一下这个例子：

df = stack_df.unstack(level=0)

取消上面给出的

df1 = pd.DataFrame({'Date':['2014-09-19', '2014-09-22'], 'Brazil':[2.7, 2.2]})
df1.set_index(pd.to_datetime(df1.Date), inplace=True); df1.drop('Date', inplace=True, axis=1)

df2 = pd.DataFrame({'Date':['2014-10-30', '2014-11-05'], 'Chile':[1.3, 1.2]})
df2.set_index(pd.to_datetime(df2.Date), inplace=True); df2.drop('Date', inplace=True, axis=1)

df_stacked = pd.concat({'brazil':df1.iloc[:,0],'chile':df2.iloc[:,0]}, axis=0)

Answer 2

我在列表压缩中使用pd.concat，我通过函数应用文件解析并传递keys参数以获得正确的列标题。

def get_df(ct):
    fn = str(ct) + ".xls"
    kw = dict(parse_dates=True,
              format='%dd%.mm.%YYYY',
              index_col=[0], skiprows=1)
    b = pd.read_excel(fn, **kw)
    # I don't know what 'rec' is.
    # I left it in but you'll have to deal with it.
    return b[str("Ann.PD-" + rec)]
    return b

pd.concat([get_df(ct) for ct in country], axis=1, keys=country)

我目前无法验证这一点

如何将日期保留为多个Excel文件中的索引和内部联接键

2 个答案:

试试这个

考虑一下这个例子：