Pandas DataFrame在非唯一时间戳上切片数据

时间:2016-04-12 05:16:58

标签: python pandas

我有一个DataFrame,包含包含带时间戳索引的测量值的数据包。指示测量部分的开始和结束的标志分组散布在消息中。以下是一个例子:

                         dev    node   meas 0   meas 1  ...
tstp
2016-04-12 03:42:16.238  instr  None  [val]    [val]
2016-04-12 03:42:16.338  cntrl  101   [val]    [val]
2016-04-12 03:42:16.442  instr  None  [val]    [val]
2016-04-12 03:42:16.445  instr  None  [val]    [val]
2016-04-12 03:42:16.445  cntrl  101   [val]    [val]
2016-04-12 03:42:16.448  instr  None  [val]    [val]
2016-04-12 03:42:16.540  instr  None  [val]    [val]
2016-04-12 03:42:16.600  cntrl  101   [val]    [val]
2016-04-12 03:42:16.639  instr  None  [val]    [val]
2016-04-12 03:42:16.741  instr  None  [val]    [val]
2016-04-12 03:42:17.238  instr  None  [val]    [val]
2016-04-12 03:42:17.338  cntrl  102   [val]    [val]
2016-04-12 03:42:17.442  instr  None  [val]    [val]
2016-04-12 03:42:17.445  instr  None  [val]    [val]
2016-04-12 03:42:17.445  cntrl  102   [val]    [val]
2016-04-12 03:42:17.448  instr  None  [val]    [val]
2016-04-12 03:42:17.540  instr  None  [val]    [val]
2016-04-12 03:42:17.600  cntrl  102   [val]    [val]
2016-04-12 03:42:17.639  instr  None  [val]    [val]
2016-04-12 03:42:17.741  instr  None  [val]    [val]

我要做的是:

for name, group in pkts.groupby('node') :
    beg = group.index[0]
    end = group.index[-1]

    # pseudocode
    pkts[ beg:end & pkts.dev=='instr' , 'node' ] = name

直接切片beg:end不起作用,因为非唯一值。任何人都可以提供一些见解或更好的方法吗?

更新(澄清):

目的:根据节点编号轻松访问“instr”设备的测量值。 “instr”设备无法传输节点值。

期望的输出(最初预期,对建议开放):

                         dev    node   meas 0   meas 1  ...
tstp
2016-04-12 03:42:16.238  instr  None  [val]    [val]
2016-04-12 03:42:16.338  cntrl  101   [val]    [val]
2016-04-12 03:42:16.442  instr  101   [val]    [val]
2016-04-12 03:42:16.445  instr  101   [val]    [val]
2016-04-12 03:42:16.445  cntrl  101   [val]    [val]
2016-04-12 03:42:16.448  instr  101   [val]    [val]
2016-04-12 03:42:16.540  instr  101   [val]    [val]
2016-04-12 03:42:16.600  cntrl  101   [val]    [val]
2016-04-12 03:42:16.639  instr  None  [val]    [val]
2016-04-12 03:42:16.741  instr  None  [val]    [val]
2016-04-12 03:42:17.238  instr  None  [val]    [val]
2016-04-12 03:42:17.338  cntrl  102   [val]    [val]
2016-04-12 03:42:17.442  instr  102   [val]    [val]
2016-04-12 03:42:17.445  instr  102   [val]    [val]
2016-04-12 03:42:17.445  cntrl  102   [val]    [val]
2016-04-12 03:42:17.448  instr  102   [val]    [val]
2016-04-12 03:42:17.540  instr  102   [val]    [val]
2016-04-12 03:42:17.600  cntrl  102   [val]    [val]
2016-04-12 03:42:17.639  instr  None  [val]    [val]
2016-04-12 03:42:17.741  instr  None  [val]    [val]

1 个答案:

答案 0 :(得分:1)

我认为您可以Multiindexreset_indexset_index,然后replace index创建NoneNaN,将fillna与方法ffillbfill

一起使用
pkts = pkts.reset_index().set_index('tstp', append=True)
print pkts
                              dev  node meas 0 meas 1
   tstp                                              
0  2016-04-12 03:42:16.238  instr  None  [val]  [val]
1  2016-04-12 03:42:16.338  cntrl   101  [val]  [val]
2  2016-04-12 03:42:16.442  instr  None  [val]  [val]
3  2016-04-12 03:42:16.445  instr  None  [val]  [val]
4  2016-04-12 03:42:16.445  cntrl   101  [val]  [val]
5  2016-04-12 03:42:16.448  instr  None  [val]  [val]
6  2016-04-12 03:42:16.540  instr  None  [val]  [val]
7  2016-04-12 03:42:16.600  cntrl   101  [val]  [val]
8  2016-04-12 03:42:16.639  instr  None  [val]  [val]
9  2016-04-12 03:42:16.741  instr  None  [val]  [val]
10 2016-04-12 03:42:16.238  instr  None  [val]  [val]
11 2016-04-12 03:42:16.338  cntrl   102  [val]  [val]
12 2016-04-12 03:42:16.442  instr  None  [val]  [val]
13 2016-04-12 03:42:16.445  instr  None  [val]  [val]
14 2016-04-12 03:42:16.445  cntrl   102  [val]  [val]
15 2016-04-12 03:42:16.448  instr  None  [val]  [val]
16 2016-04-12 03:42:16.540  instr  None  [val]  [val]
17 2016-04-12 03:42:16.600  cntrl   102  [val]  [val]
18 2016-04-12 03:42:16.639  instr  None  [val]  [val]
19 2016-04-12 03:42:16.741  instr  None  [val]  [val]

pkts['node'] = pkts['node'].replace('None',np.nan)

for name, group in pkts.groupby('node'):
    beg = group.index[0]
    end = group.index[-1]
#    print beg
#    print end
    pkts.loc[ beg:end,'node' ] = pkts.loc[ beg:end,'node' ].fillna(method='ffill')
                                                           .fillna(method='bfill')
print pkts 
                              dev node meas 0 meas 1
   tstp                                             
0  2016-04-12 03:42:16.238  instr  NaN  [val]  [val]
1  2016-04-12 03:42:16.338  cntrl  101  [val]  [val]
2  2016-04-12 03:42:16.442  instr  101  [val]  [val]
3  2016-04-12 03:42:16.445  instr  101  [val]  [val]
4  2016-04-12 03:42:16.445  cntrl  101  [val]  [val]
5  2016-04-12 03:42:16.448  instr  101  [val]  [val]
6  2016-04-12 03:42:16.540  instr  101  [val]  [val]
7  2016-04-12 03:42:16.600  cntrl  101  [val]  [val]
8  2016-04-12 03:42:16.639  instr  NaN  [val]  [val]
9  2016-04-12 03:42:16.741  instr  NaN  [val]  [val]
10 2016-04-12 03:42:16.238  instr  NaN  [val]  [val]
11 2016-04-12 03:42:16.338  cntrl  102  [val]  [val]
12 2016-04-12 03:42:16.442  instr  102  [val]  [val]
13 2016-04-12 03:42:16.445  instr  102  [val]  [val]
14 2016-04-12 03:42:16.445  cntrl  102  [val]  [val]
15 2016-04-12 03:42:16.448  instr  102  [val]  [val]
16 2016-04-12 03:42:16.540  instr  102  [val]  [val]
17 2016-04-12 03:42:16.600  cntrl  102  [val]  [val]
18 2016-04-12 03:42:16.639  instr  NaN  [val]  [val]
19 2016-04-12 03:42:16.741  instr  NaN  [val]  [val]