前进/后退通过递增/递减最后找到的值来填充na?

时间:2017-03-12 13:31:31

标签: python pandas

鉴于以下pandas数据框(可以找到它的副本here。如何在一个单独的列中填充na,增加/减少nr行,直到下一个信号值和前向/后向信号值? 信号值仅为:1; -1或np.na

+----+---------+--------+
|    | Values  | Signal |
+----+---------+--------+
|  0 | 1420.49 |        |
|  1 | 1421.12 |        |
|  2 | 1418.95 |        |
|  3 | 1419.04 |      1 |
|  4 | 1419.04 |        |
|  5 | 1417.51 |        |
|  6 | 1416.97 |        |
|  7 | 1413.21 |     -1 |
|  8 | 1411.49 |        |
|  9 | 1412.57 |        |
| 10 | 1408.55 |      1 |
| 11 | 1409.16 |        |
| 12 | 1413.38 |        |
| 13 | 1413.38 |      1 |
| 14 | 1402.35 |        |
| 15 |  1397.8 |        |
| 16 | 1398.36 |        |
| 17 | 1397.62 |        |
| 18 | 1394.58 |     -1 |
| 19 | 1399.05 |        |
| 20 |  1399.9 |        |
| 21 | 1398.96 |     -1 |
| 22 | 1398.96 |        |
| 23 | 1393.69 |        |
| 24 | 1398.13 |        |
| 25 | 1398.66 |        |
| 26 | 1398.02 |      1 |
| 27 | 1397.97 |        |
| 28 | 1396.05 |        |
| 29 | 1398.13 |        |
+----+---------+--------+

结果应该是这样的结果(here是它的副本):

+----+---------+--------+------------------------+----------------------+-----------------+
|    | Values  | Signal | forward signal rows nr | backward signal rows | value at signal |
+----+---------+--------+------------------------+----------------------+-----------------+
|  0 | 1420.49 |        |                        |                      |                 |
|  1 | 1421.12 |        |                        |                      |                 |
|  2 | 1418.95 |        |                        |                      |                 |
|  3 | 1419.04 |      1 |                      1 |                    4 |         1416.97 |
|  4 | 1419.04 |        |                      2 |                    3 |         1416.97 |
|  5 | 1417.51 |        |                      3 |                    2 |         1416.97 |
|  6 | 1416.97 |        |                      4 |                    1 |         1416.97 |
|  7 | 1413.21 |     -1 |                     -1 |                   -3 |         1412.57 |
|  8 | 1411.49 |        |                     -2 |                   -2 |         1412.57 |
|  9 | 1412.57 |        |                     -3 |                   -1 |         1412.57 |
| 10 | 1408.55 |      1 |                      1 |                    3 |         1413.38 |
| 11 | 1409.16 |        |                      2 |                    2 |         1413.38 |
| 12 | 1413.38 |        |                      3 |                    1 |         1413.38 |
| 13 | 1413.38 |      1 |                      1 |                    5 |         1397.62 |
| 14 | 1402.35 |        |                      2 |                    4 |         1397.62 |
| 15 |  1397.8 |        |                      3 |                    3 |         1397.62 |
| 16 | 1398.36 |        |                      4 |                    2 |         1397.62 |
| 17 | 1397.62 |        |                      5 |                    1 |         1397.62 |
| 18 | 1394.58 |     -1 |                     -1 |                   -3 |          1399.9 |
| 19 | 1399.05 |        |                     -2 |                   -2 |          1399.9 |
| 20 |  1399.9 |        |                     -3 |                   -1 |          1399.9 |
| 21 | 1398.96 |     -1 |                     -1 |                   -5 |         1398.66 |
| 22 | 1398.96 |        |                     -2 |                   -4 |         1398.66 |
| 23 | 1393.69 |        |                     -3 |                   -3 |         1398.66 |
| 24 | 1398.13 |        |                     -4 |                   -2 |         1398.66 |
| 25 | 1398.66 |        |                     -5 |                   -1 |         1398.66 |
| 26 | 1398.02 |      1 |                      1 |                    4 |         1398.13 |
| 27 | 1397.97 |        |                      2 |                    3 |         1398.13 |
| 28 | 1396.05 |        |                      3 |                    2 |         1398.13 |
| 29 | 1398.13 |        |                      4 |                    1 |         1398.13 |
+----+---------+--------+------------------------+----------------------+-----------------+

我通过几个嵌套循环实现了最终结果,但问题是它们在几百万行的较大数据帧上效率非常低。

1 个答案:

答案 0 :(得分:3)

使用compare-cumsum-groupby模式的基于信号的分组(我们应该有更好的原生支持,恕我直言)的常用方法。这里的比较是确定信号条目是否为空,之后我们进行累积求和,以便每个信号组都有自己的id(组ID或gid)。其余的只是算术。

虽然这里有一些重复,但我们可以重构,我感觉很懒,所以:

gid = df["Signal"].notnull().cumsum()
dg = df.groupby(gid)
sign = dg["Signal"].transform("first")
df["forward signal rows"] = (dg.cumcount() + 1) * sign
df["backward signal rows"] = (dg["Signal"].transform("size") - dg.cumcount()) * sign
df["value at signal"] = dg["Values"].transform("last")
df.loc[gid == 0, "value at signal"] = np.nan

给我一​​个与你的目标相匹配的框架。