查询/替换DataFrame中的元素,并在其正下方

时间:2019-04-12 06:50:28

标签: python dataframe

我有一个数据框,在其中需要查询0.00s并将其直接替换为下面的值如果满足某些条件。我一直在寻找有关这种行为的文档,但一直找不到有效的Pythonic解决方案。

逻辑如下:

如果[符号] ='VIX'和[QuoteDateTime]包含'09:31:00'和[关闭] ='0.00'

之后,我想将[Close]值替换为它下面的[Close]值。

+----+--------+---------------------+---------+
|    | Symbol |    QuoteDateTime    |  Close  |
+----+--------+---------------------+---------+
|  0 | VIX    | 2019-04-11 09:31:00 |    0.00 |
|  1 | VIX    | 2019-04-11 09:32:00 |   14.24 |
|  2 | VIX    | 2019-04-11 09:33:00 |   14.40 |
|  3 | SPX    | 2019-04-11 09:31:00 | 2911.09 |
|  4 | SPX    | 2019-04-11 09:32:00 | 2911.55 |
|  5 | SPX    | 2019-04-11 09:33:00 | 2915.22 |
|  6 | VIX    | 2019-04-12 09:31:00 |    0.00 |
|  7 | VIX    | 2019-04-12 09:32:00 |   15.64 |
|  8 | VIX    | 2019-04-12 09:33:00 |   15.80 |
|  9 | SPX    | 2019-04-12 09:31:00 | 2901.09 |
| 10 | SPX    | 2019-04-12 09:32:00 | 2901.55 |
| 11 | SPX    | 2019-04-12 09:33:00 | 2905.22 |
+----+--------+---------------------+---------+

预期的输出将是索引0 [关闭]为14.24,索引6 [关闭]为15.64。其他所有内容保持不变。

+----+--------+---------------------+---------+
|    | Symbol |    QuoteDateTime    |  Close  |
+----+--------+---------------------+---------+
|  0 | VIX    | 2019-04-11 09:31:00 |   14.24 |
|  1 | VIX    | 2019-04-11 09:32:00 |   14.24 |
|  2 | VIX    | 2019-04-11 09:33:00 |   14.40 |
|  3 | SPX    | 2019-04-11 09:31:00 | 2911.09 |
|  4 | SPX    | 2019-04-11 09:32:00 | 2911.55 |
|  5 | SPX    | 2019-04-11 09:33:00 | 2915.22 |
|  6 | VIX    | 2019-04-12 09:31:00 |   15.64 |
|  7 | VIX    | 2019-04-12 09:32:00 |   15.64 |
|  8 | VIX    | 2019-04-12 09:33:00 |   15.80 |
|  9 | SPX    | 2019-04-12 09:31:00 | 2901.09 |
| 10 | SPX    | 2019-04-12 09:32:00 | 2901.55 |
| 11 | SPX    | 2019-04-12 09:33:00 | 2905.22 |
+----+--------+---------------------+---------+

2 个答案:

答案 0 :(得分:2)

Series.eq==创建布尔掩码,用Series.dt.strftimedatetimes中的字符串创建布尔掩码,并用Series.maskSeries.shift设置新值:< / p>

#convert to datetimes if necessary
df['QuoteDateTime'] = pd.to_datetime(df['QuoteDateTime'])

mask = (df['Symbol'].eq('VIX') & 
        df['QuoteDateTime'].dt.strftime('%H:%M:%S').eq('09:31:00') &
        df['Close'].eq(0))

df['Close'] = df['Close'].mask(mask, df['Close'].shift(-1))
#alternative1
#df.loc[mask, 'Close'] = df['Close'].shift(-1)
#alternative2
#df['Close'] = np.where(mask, df['Close'].shift(-1), df['Close'])
print (df)

   Symbol       QuoteDateTime    Close
0     VIX 2019-04-11 09:31:00    14.24
1     VIX 2019-04-11 09:32:00    14.24
2     VIX 2019-04-11 09:33:00    14.40
3     SPX 2019-04-11 09:31:00  2911.09
4     SPX 2019-04-11 09:32:00  2911.55
5     SPX 2019-04-11 09:33:00  2915.22
6     VIX 2019-04-12 09:31:00    15.64
7     VIX 2019-04-12 09:32:00    15.64
8     VIX 2019-04-12 09:33:00    15.80
9     SPX 2019-04-12 09:31:00  2901.09
10    SPX 2019-04-12 09:32:00  2901.55
11    SPX 2019-04-12 09:33:00  2905.22

答案 1 :(得分:1)

不是专家,但是您可以尝试使用索引:

首先使用以下 short 行获取索引:

idx = df.index[(df['Symbol'] == 'VIX') & (df['QuoteDateTime'].str.contains("09:31:00")) & (df['Close'] == '0.0')]

然后使用索引将值设置为以下行中的值:

df.loc[idx, 'Close'] = df.loc[idx+1, 'Close'].values