如何更换pandas中的时间序列数据帧特定值?

时间:2018-02-02 05:34:49

标签: python pandas replace time-series

我有下面的数据帧(日期/时间是多索引),我想将(00:00:00~07:00:00)中的列值替换为numpy数组:

[[ 21.63920663  21.62012822  20.9900515   21.23217008  21.19482458
   21.10839656  20.89631935  20.79977166  20.99176729  20.91567565
   20.87258765  20.76210464  20.50357827  20.55897631  20.38005033
   20.38227309  20.54460993  20.37707293  20.08279925  20.09955877
   20.02559575  20.12390737  20.2917257   20.20056711  20.1589065
   20.41302289  20.48000767  20.55604102  20.70255192]]

     date        time    
2018-01-26  00:00:00    21.65
            00:15:00      NaN
            00:30:00      NaN
            00:45:00      NaN
            01:00:00      NaN
            01:15:00      NaN
            01:30:00      NaN
            01:45:00      NaN
            02:00:00      NaN
            02:15:00      NaN
            02:30:00      NaN
            02:45:00      NaN
            03:00:00      NaN
            03:15:00      NaN
            03:30:00      NaN
            03:45:00      NaN
            04:00:00      NaN
            04:15:00      NaN
            04:30:00      NaN
            04:45:00      NaN
            05:00:00      NaN
            05:15:00      NaN
            05:30:00      NaN
            05:45:00      NaN
            06:00:00      NaN
            06:15:00      NaN
            06:30:00      NaN
            06:45:00      NaN
            07:00:00      NaN
            07:15:00      NaN
            07:30:00      NaN
            07:45:00      NaN
            08:00:00      NaN
            08:15:00      NaN
            08:30:00      NaN
            08:45:00      NaN
            09:00:00      NaN
            09:15:00      NaN
            09:30:00      NaN
            09:45:00      NaN
            10:00:00      NaN
            10:15:00      NaN
            10:30:00      NaN
            10:45:00      NaN
            11:00:00      NaN
Name: temp, dtype: float64
<class 'datetime.time'>

我该怎么做?

1 个答案:

答案 0 :(得分:1)

您可以使用slicers

import datetime

idx = pd.IndexSlice
df1.loc[idx[:, datetime.time(0, 0, 0):datetime.time(2, 0, 0)],:] = 1

或者如果第二级是时间:

print (df1)
                       aaa
date       time           
2018-01-26 00:00:00  21.65
           00:15:00    NaN
           00:30:00    NaN
           00:45:00    NaN
           01:00:00    NaN
           01:15:00    NaN
           01:30:00    NaN
           01:45:00    NaN
           02:00:00    NaN
           02:15:00    NaN
           02:30:00    NaN
           02:45:00    NaN
           03:00:00    NaN
2018-01-27 00:00:00   2.00
           00:15:00    NaN
           00:30:00    NaN
           00:45:00    NaN
           01:00:00    NaN
           01:15:00    NaN
           01:30:00    NaN
           01:45:00    NaN
           02:00:00    NaN
           02:15:00    NaN
           02:30:00    NaN
           02:45:00    NaN
           03:00:00    NaN

<强>示例

idx = pd.IndexSlice
df1.loc[idx[:, '00:00:00':'02:00:00'],:] = 1
print (df1)
                     aaa
date       time         
2018-01-26 00:00:00  1.0
           00:15:00  1.0
           00:30:00  1.0
           00:45:00  1.0
           01:00:00  1.0
           01:15:00  1.0
           01:30:00  1.0
           01:45:00  1.0
           02:00:00  1.0
           02:15:00  NaN
           02:30:00  NaN
           02:45:00  NaN
           03:00:00  NaN
2018-01-27 00:00:00  1.0
           00:15:00  1.0
           00:30:00  1.0
           00:45:00  1.0
           01:00:00  1.0
           01:15:00  1.0
           01:30:00  1.0
           01:45:00  1.0
           02:00:00  1.0
           02:15:00  NaN
           02:30:00  NaN
           02:45:00  NaN
           03:00:00  NaN
df1.loc[idx[:, '00:00:00':'02:00:00'],:] = np.tile(np.arange(1, 10),len(df1.index.levels[0]))
print (df1)
                     aaa
date       time         
2018-01-26 00:00:00  1.0
           00:15:00  2.0
           00:30:00  3.0
           00:45:00  4.0
           01:00:00  5.0
           01:15:00  6.0
           01:30:00  7.0
           01:45:00  8.0
           02:00:00  9.0
           02:15:00  NaN
           02:30:00  NaN
           02:45:00  NaN
           03:00:00  NaN
2018-01-27 00:00:00  1.0
           00:15:00  2.0
           00:30:00  3.0
           00:45:00  4.0
           01:00:00  5.0
           01:15:00  6.0
           01:30:00  7.0
           01:45:00  8.0
           02:00:00  9.0
           02:15:00  NaN
           02:30:00  NaN
           02:45:00  NaN
           03:00:00  NaN

编辑:

对于赋值数组,必须使用numpy.tile重复第一级唯一值的长度:

idx = pd.IndexSlice
len0 = df1.loc[idx[df1.index.levels[0][0], '00:00:00':'02:00:00'],:].shape[0]
len1 = len(df1.index.levels[0])
df1.loc[idx[:, '00:00:00':'02:00:00'],:] = np.tile(np.arange(1, len0 + 1), len1)

通过切片长度生成数组的更一般的解决方案:

time

使用import datetime idx = pd.IndexSlice arr =np.tile(np.arange(1, 10),len(df1.index.levels[0])) df1.loc[idx[:, datetime.time(0, 0, 0):datetime.time(2, 0, 0)],:] = arr print (df1) aaa date time 2018-01-26 00:00:00 1.0 00:15:00 2.0 00:30:00 3.0 00:45:00 4.0 01:00:00 5.0 01:15:00 6.0 01:30:00 7.0 01:45:00 8.0 02:00:00 9.0 02:15:00 NaN 02:30:00 NaN 02:45:00 NaN 03:00:00 NaN 2018-01-27 00:00:00 1.0 00:15:00 2.0 00:30:00 3.0 00:45:00 4.0 01:00:00 5.0 01:15:00 6.0 01:30:00 7.0 01:45:00 8.0 02:00:00 9.0 02:15:00 NaN 02:30:00 NaN 02:45:00 NaN 03:00:00 NaN s进行测试:

DataFrame

编辑:

最后发现了问题 - 我的解决方案只有一列Series,但如果使用:,则需要删除一个arr = np.array([[ 21.63920663, 21.62012822, 20.9900515, 21.23217008, 21.19482458, 21.10839656, 20.89631935, 20.79977166, 20.99176729, 20.91567565, 20.87258765, 20.76210464, 20.50357827, 20.55897631, 20.38005033, 20.38227309, 20.54460993, 20.37707293, 20.08279925, 20.09955877, 20.02559575, 20.12390737, 20.2917257, 20.20056711, 20.1589065, 20.41302289, 20.48000767, 20.55604102, 20.70255192]]) import datetime idx = pd.IndexSlice df1.loc[idx[:, datetime.time(0, 0, 0): datetime.time(7, 0, 0)]] = arr[0] ---^^^

{{1}}