使用Multiindex和多列重新采样

时间:2018-04-20 04:25:06

标签: python-3.x pandas

我有一个pandas数据帧,结构如下:

ID    date           m_1   m_2 
 1    2016-01-03     10    3.4
      2016-02-07     11    3.3
      2016-02-07     10.4  2.8
 2    2016-01-01     10.9  2.5
      2016-02-04     12    2.3
      2016-02-04     11    2.7
      2016-02-04     12.1  2.1

IDdate都是MultiIndex。数据代表一些传感器(在示例中为两个传感器)进行的一些测量。这些传感器有时每天会产生多次测量(如示例所示)。

我的问题是:

  • 如何对此进行重新取样,以便每个传感器每天有一行,但是一列包含mean,另一列包含max另一列包含min等?
  • 我怎样才能"对齐" (也许这不是正确的词)两个时间序列,所以同时开始和结束(从2016-01-012016-02-07)用NAs添加缺少的日期?

1 个答案:

答案 0 :(得分:2)

您可以将groupbyDataFrameGroupBy.resample一起使用,然后按dict中的函数进行汇总,然后reindex MultiIndex.from_product进行汇总:

df = df.reset_index(level=0).groupby('ID').resample('D').agg({'m_1':'mean', 'm_2':'max'})
df = df.reindex(pd.MultiIndex.from_product(df.index.levels, names = df.index.names))

#alternative for adding missing start and end datetimes
#df = df.unstack().stack(dropna=False)
print (df.head())
               m_2   m_1
ID date                 
1  2016-01-01  NaN   NaN
   2016-01-02  NaN   NaN
   2016-01-03  3.4  10.0
   2016-01-04  NaN   NaN
   2016-01-05  NaN   NaN

对于二级PeriodIndexset_levels使用to_period

df.index = df.index.set_levels(df.index.get_level_values('date').to_period('d'), level=1)

print (df.index.get_level_values('date'))

PeriodIndex(['2016-01-01', '2016-01-02', '2016-01-03', '2016-01-04',
             '2016-01-05', '2016-01-06', '2016-01-07', '2016-01-08',
             '2016-01-09', '2016-01-10', '2016-01-11', '2016-01-12',
             '2016-01-13', '2016-01-14', '2016-01-15', '2016-01-16',
             '2016-01-17', '2016-01-18', '2016-01-19', '2016-01-20',
             '2016-01-21', '2016-01-22', '2016-01-23', '2016-01-24',
             '2016-01-25', '2016-01-26', '2016-01-27', '2016-01-28',
             '2016-01-29', '2016-01-30', '2016-01-31', '2016-02-01',
             '2016-02-02', '2016-02-03', '2016-02-04', '2016-02-05',
             '2016-02-06', '2016-02-07', '2016-01-01', '2016-01-02',
             '2016-01-03', '2016-01-04', '2016-01-05', '2016-01-06',
             '2016-01-07', '2016-01-08', '2016-01-09', '2016-01-10',
             '2016-01-11', '2016-01-12', '2016-01-13', '2016-01-14',
             '2016-01-15', '2016-01-16', '2016-01-17', '2016-01-18',
             '2016-01-19', '2016-01-20', '2016-01-21', '2016-01-22',
             '2016-01-23', '2016-01-24', '2016-01-25', '2016-01-26',
             '2016-01-27', '2016-01-28', '2016-01-29', '2016-01-30',
             '2016-01-31', '2016-02-01', '2016-02-02', '2016-02-03',
             '2016-02-04', '2016-02-05', '2016-02-06', '2016-02-07'],
            dtype='period[D]', name='date', freq='D')