根据另一个数据帧索引中的日期和月份重新排列数据帧中的组

时间:2017-12-04 06:50:55

标签: python pandas

我有2个数据帧:

DF_A

datetime      var
2016-10-15    110.232790
2016-10-16    111.020661
2016-10-17    112.193496
2016-10-18    113.638143
2016-10-19    115.241448
2017-01-01    113.638143
2017-01-02    115.241448

和df_b

datetime      var
2000-01-01    165.792185
2000-01-02    166.066959
2000-01-03    166.411669
2000-01-04    167.816046
2000-01-05    169.777814
2000-10-15    114.232790
2000-10-16    113.020661
2001-01-01    164.792185
2001-01-02    161.066959
2001-01-03    156.411669
2002-01-04    167.816046
2002-01-05    169.777814
2002-10-15    174.232790
2003-10-16    114.020661

df_a包含2016年,2017年的信息,df_b拥有2000年至2015年的信息(这些年份没有重叠)。

我是否可以将df_b数据框中的每个组安排为与df_a相同的日期顺序?组被定义为具有相同年份的行,例如2000

1 个答案:

答案 0 :(得分:1)

您可以链接新条件以进行检查year

df = df_b[df_b.index.month.isin(df_a.index.month) &
          df_b.index.day.isin(df_a.index.day) & 
          (df_b.index.year == 2000)]
print (df)
                   var
datetime              
2000-01-01  165.792185
2000-01-02  166.066959
2000-10-15  114.232790
2000-10-16  113.020661

编辑:

df = df_b[df_b.index.month.isin(df_a.index.month) & df_b.index.day.isin(df_a.index.day)]
print (df)
                   var
datetime              
2000-01-01  165.792185
2000-01-02  166.066959
2000-10-15  114.232790
2000-10-16  113.020661
2001-01-01  164.792185
2001-01-02  161.066959
2002-10-15  174.232790
2003-10-16  114.020661

#create dictionary of weights by factorize
a = pd.factorize(df_a.index.strftime('%m-%d'))
d = dict(zip(a[1], a[0]))
print (d)
{'01-02': 6, '10-19': 4, '10-18': 3, '10-15': 0, '01-01': 5, '10-16': 1, '10-17': 2}

#ordering Series, multiple by 1000 becasue possible 1 to 366 MMDD
order = pd.Series(df.index.strftime('%m-%d'), index=df.index).map(d) + df.index.year * 1000
print (order)
datetime
2000-01-01    2000005
2000-01-02    2000006
2000-10-15    2000000
2000-10-16    2000001
2001-01-01    2001005
2001-01-02    2001006
2002-10-15    2002000
2003-10-16    2003001
Name: datetime, dtype: int64

排序order索引的最后reindex

df = df.reindex(order.sort_values().index)
print (df)
                   var
datetime              
2000-10-15  114.232790
2000-10-16  113.020661
2000-01-01  165.792185
2000-01-02  166.066959
2001-01-01  164.792185
2001-01-02  161.066959
2002-10-15  174.232790
2003-10-16  114.020661