按日期对齐熊猫列数据框

时间:2020-10-06 12:02:23

标签: python pandas indexing alignment

数据帧如下(3项...原始数以百计):

                    Log1     Date2      Log2    Date3       Log3
           Date1
          01.01.2000 1000     02.01.2000 2000    01.01.2000  3000
          02.01.2000 1050     03.01.2000 1950    02.01.2000  3020
          03.01.2000 1100     04.01.2000 2000    03.01.2000  3000

所需的前景,对齐日期:

                   Log1  Log2  Log3  
          Date     
      01.01.2000 1,000   nan 3,000
      02.01.2000 1,050 2,000 3,020
      03.01.2000 1,100 1,950 3,000
      04.01.2000   nan 2,000   nan
  • 问题:如何按日期对齐?

数据框的简短示例:

                BBAS3   Data.1      PETR4   Data.2     TRSD      Data.3   JKHD
    Data                            
    2020-10-05  30.15   2020-10-05  19.91   2020-10-05  30.15   2020-10-05  19.91
    2020-10-02  29.71   2020-10-02  19.02   2020-10-02  29.71   2020-10-01  19.85
    2020-10-01  29.79   2020-10-01  19.85   2020-10-01  29.79   2020-09-30  19.61
    2020-09-30  29.62   2020-09-30  19.61   2020-09-30  29.62   2020-09-29  19.31
    2020-09-29  29.76   2020-09-29  19.31   2020-09-29  29.76   2020-09-28  19.63

1 个答案:

答案 0 :(得分:1)

如果输入数据具有DatetimeIndex循环不成对并配对列名称的想法,请一起创建Seriesconcat

#convert Datetimeindex to column
df1 = df.reset_index()

zipped = zip(df1.columns[::2], df1.columns[1::2])
df1 = pd.concat([df1.set_index(a)[b] for a, b in zipped], axis=1)
df1.index = pd.to_datetime(df1.index)
df1 = df1.sort_index()

print (df1)
            BBAS3  PETR4   TRSD   JKHD
2020-09-28    NaN    NaN    NaN  19.63
2020-09-29  29.76  19.31  29.76  19.31
2020-09-30  29.62  19.61  29.62  19.61
2020-10-01  29.79  19.85  29.79  19.85
2020-10-02  29.71  19.02  29.71    NaN
2020-10-05  30.15  19.91  30.15  19.91

编辑:

#sample data generate error - because duplicated dates in soem column like here in Data
print (df)
             BBAS3      Data.1  PETR4      Data.2   TRSD      Data.3   JKHD
Data                                                                       
2020-10-05  200.00  2020-10-05  19.91  2020-10-05  30.15  2020-10-05  19.91
2020-10-05  100.00  2020-10-02  19.02  2020-10-02  29.71  2020-10-01  19.85
2020-10-01   29.79  2020-10-01  19.85  2020-10-01  29.79  2020-09-30  19.61
2020-09-30   29.62  2020-09-30  19.61  2020-09-30  29.62  2020-09-29  19.31
2020-09-29   29.76  2020-09-29  19.31  2020-09-29  29.76  2020-09-28  19.63

df1 = df.reset_index()

zipped = zip(df1.columns[::2], df1.columns[1::2])
df1 = pd.concat([df1.groupby(a)[b].sum() for a, b in zipped], axis=1)
df1.index = pd.to_datetime(df1.index)
df1 = df1.sort_index()

print (df1)
             BBAS3  PETR4   TRSD   JKHD
2020-09-28     NaN    NaN    NaN  19.63
2020-09-29   29.76  19.31  29.76  19.31
2020-09-30   29.62  19.61  29.62  19.61
2020-10-01   29.79  19.85  29.79  19.85
2020-10-02     NaN  19.02  29.71    NaN
2020-10-05  300.00  19.91  30.15  19.91