遍历行以查找数据帧中的模式

时间:2019-08-16 03:24:21

标签: python pandas dataframe

我有一个数据框:


  | type | val
-----------------
0 | low  | 0.5
1 | high | 1.2
2 | NaN  | NaN
3 | low  | 1
4 | NaN  | NaN
5 | high | 3
6 | NaN  | NaN
7 | low  | 2
8 | high | 4
9 | NaN  | NaN
10| low  | 3
..............
98| low  | 0.5
99| NaN  | NaN

我想做的是找到一个像low1-> high1-> low2-> high2-> low3的模式,同时在上面的数据框中检查low2> low1和high2> high1等,并提取它们的值到新的数据框。

如果只有一部分满足(例如(low1-> high1),而不是其他),我也希望从这一点开始进行迭代,这样我就不会错过两者之间的任何模式。

我尝试使用iloc一次获取五个索引,并使用一个长的if语句比较它们​​,但这似乎不是最有效的编码方式

sh = high
sl = low

for idx in range(0 ,df.shape[0] -4)
    if (sl in str(df.iloc[idx]['type'])) and (sh in str(df.iloc[idx+1]['type'])) and (sl in str(df.iloc[idx+2]['type'])) and (sh in str(df.iloc[idx+3]['type'])) and (sl in str(df.iloc[idx+4]['type'])) and (df.iloc[idx+4]['val'] > df.iloc[idx+2]['val']) and (df.iloc[idx+3]['val'] > df.iloc[idx+1]['val']) and (df.iloc[idx+2]['val'] > df.iloc[idx]['val'])  and (0.3 * (df.iloc[idx+1]['val'] - df.iloc[idx]['val']) < ((df.iloc[idx+1]['val'] - df.iloc[idx+2]['val']))) and (0.3 * (df.iloc[idx+3]['val'] - df.iloc[idx+2]['val']) < ((df.iloc[idx+3]['val'] - df.iloc[idx+4]['val']))):

        # get the 5 values here and append it the dataframe

最终结果的示例应为:

  | type | val | pattern
--------------------------
0 | low  | 0.5 | l1
1 | high | 1.2 | h1
2 | NaN  | NaN | NaN
3 | low  | 1   | l2
4 | NaN  | NaN | NaN
5 | high | 3   | h2
6 | NaN  | NaN | NaN
7 | low  | 2   | l3
8 | high | 4   | NaN #NaN since this doesn't form a pattern (Our pattern always starts with a low)
9 | NaN  | NaN | NaN 
10| low  | 3   | l1
..............
98| low  | 0.5
99| NaN  | NaN

0 个答案:

没有答案