Python - 使用+/-振荡值迭代数据帧并根据条件

时间:2018-04-19 22:38:54

标签: python pandas loops dataframe iterator

需要帮助根据两个振荡值的条件将不同的状态标记到新的数据帧列中; X栏和专栏是的。

使用列Y作为状态间隔。状态间隔从0开始并以0结束。请注意,Y列中的值将始终保持在正或负范围内。每个间隔周期将按顺序排列为全部为+, - ,+, - 等。

当Y列值变为正值高于0并且在变为负值之前停止在0时,标记开始;是循环的结束,并将开始下一个范围或循环进入负范围。

总共有6种模式:A,B,C,D,E,F作为循环状态。我试图弄清楚逻辑以及如何将每个状态的标签添加到名为state的新数据帧列中。标记在每个循环中发生,并在每个新循环状态下重新开始。

+-------+-------------+---------+  
| State |      X      |    Y    |  
+-------+-------------+---------+  
|   A   | from - to + |    +    |  
|   B   |      +      |    +    |  
|   C   |      -      |    +    |  
|   D   |      +      |    -    |  
|   E   |      -      |    -    |  
|   F   | from + to - |    -    |  
+-------+-------------+---------+  

对于国家A& F,(列X)的值从+到 - 或反之亦然,交叉超过0.列Y中的值将始终保持在正或负范围内。

国家B,C,D,E没有交叉(第X栏)。以下是示例数据帧值和示例结果状态的新列。

+----+---------+---------+-------+  
|  # |    X    |    Y    | State |  
+----+---------+---------+-------+  
|  1 | -0.0034 |  0.0056 |   A   | Cycle 1 (+)  
|  2 | -0.0001 |  0.0070 |   A   |  
|  3 |  0.0019 |  0.0073 |   A   |  
|  4 |  0.0039 |  0.0075 |   A   |  
|    |         |         |       |  
|  5 |  0.0273 | -0.0037 |   D   | Cycle 2 (-)  
|  6 |  0.0237 | -0.0059 |   D   |  
|    |         |         |       |  
|  7 |  0.0047 |  0.0028 |   B   | Cycle 3 (+)  
|  8 |  0.0044 |  0.0020 |   B   |  
|    |         |         |       |  
|  9 | -0.0034 | -0.0006 |   E   | Cycle 4 (-)    
| 10 | -0.0045 | -0.0014 |   E   |  
|    |         |         |       |  
| 11 | -0.0021 |  0.0006 |   C   | Cycle 5 (+)  
| 12 | -0.0019 |  0.0007 |   C   |  
|    |         |         |       |  
| 13 |  0.0041 | -0.0054 |   F   | Cycle 6 (-)  
| 14 |  0.0017 | -0.0060 |   F   |  
| 15 | -0.0021 | -0.0059 |   F   |  
| 16 | -0.0023 | -0.0057 |   F   |  
+----+---------+---------+-------+  
Cycles will continue 7, 8, 9, 10, etc. in the time series

具有12个循环的DataFrame,类似于上面的示例,在结果中显示了每个模式A,B,C,D,E,F两次。

df = pd.DataFrame({
    'x': [-0.0034, -0.0001, 0.0019, 0.0039, 0.0273, 0.0237, 0.0047, 0.0044, -0.0034, -0.0045, -0.0021, -0.0019, 0.0041, 0.0017, -0.0021, -0.0023, -0.0014, -0.0002, 0.0018, 0.0031, 0.0171, 0.0230, 0.0035, 0.0040, -0.0030, -0.0040, -0.0020, -0.0015, 0.0030, 0.0010, -0.0030, -0.0020, ],
    'y': [0.0056, 0.007, 0.0073, 0.0075, -0.0037, -0.0059, 0.0028, 0.002, -0.0006, -0.0014, 0.0006, 0.0007, -0.0054, -0.006, -0.0059, -0.0057, 0.0040, 0.005, 0.0065, 0.0070, -0.0022, -0.0045, 0.0020, 0.001, -0.0005, -0.0010, 0.0003, 0.0005, -0.0050, -0.005, -0.0060, -0.0040, ],
})

接下来是开始对数据帧进行迭代编码的示例,需要帮助构建逻辑,合并A& F指出,经历每个+/-循环并指导如何迭代Y列以寻找X列中的交叉值。

State = []

for i, row in df.iterrows():  #i: dataframe index; row: each row in series format  
    if row['X'] > 0 and row['Y'] > 0:  
        State.append('B')  
    elif row['X'] < 0 and row['Y'] > 0:  
        State.append('C')  
    elif row['X'] > 0 and row['Y'] < 0:  
        State.append('D')  
    elif row['X'] < 0 and row['Y'] < 0:  
        State.append('E')  
    else:  
        State.append('err')  

df['State'] = State  
print(df)  

同样,上述代码并未包含A&amp; F说。

更新

仍然需要帮助,下面是带注释的更新代码,并将解释什么不起作用。

# Creating new column as + or - based on Column Y value
df['y_pos'] = np.where((df.y > 0), True, False)

# Creating new column to label the cycle as they are increasing order 1,2,3, etc.
df['cycle_n'] = (df.y_pos != df.y_pos.shift(1)).cumsum()

# returns dictionary whose keys and values are from DataFrames
# to be able to loop through the cycles
gb = df.groupby('cycle_n')
groups = dict(list(gb))

State = []

for name, group in gb:
    # Information to help compare our final results
    print("Group:" + str(name) )
    print("=====================")
    print("Min:" + str(group.min()) )
    print("Max:" + str(group.max()) )
    print("--- Group Data -----")
    print(group)
    print("--------------------")
    print("--- Column X Row Data -----")

    for index, row in group.iterrows(): # loop each row

        if row['y_pos'] == True: # Column Y is (+)

            print( row['x'] ) # row data value for Column X

        # trying to use min and max in each cycle to figure out
        # if there is a crossover 

        # ISSUE: min and max is holding data values for each of the
        # columns, not only Column X which maybe the reason why 
        # it's not working correctly

            if [ (group.min() <= 0) & (group.max() >= 0) ]:
                State.append('A')
            elif row['x'] >= 0:
                State.append('B')
            elif row['x'] < 0:
                State.append('C')
            else:  
                State.append('err')

        elif row['y_pos'] == False: # Column Y is (-)

            print( row['x'] )

        # ISSUE: again min and max is holding data values for each of the
        # columns, maybe the reason why it's not working correctly

            if [ (group.max() >= 0) & (group.min() <= 0) ]:
                State.append('F')
            elif row['x'] >= 0:
                State.append('D')
            elif row['x'] < 0:
                State.append('E')
            else:  
                State.append('err')
        else:
            print("err")

df['State'] = State  

# Combining y_pos & cycle_n to be printed out.
df['Label'] = 'Cycle ' + df.cycle_n.astype(str) + ' ' + df.y_pos.map({True: '(+)', False: '(-)'})

del df['y_pos']
del df['cycle_n']

print(df)

此代码出现问题。它只是标记国家A&amp; F现在将其他状态错误标记为A或F.使用min和max的If语句返回true;真的不对,因为它在字典中保留了所有列mins和max的值。例如,

print("Min:" + str(group.min()) )

Min:
x         -0.0034
y          0.0056
y_pos      1.0000
cycle_n    1.0000
dtype: float64

不知道这是否是最好的方法,只是越来越接近它正常工作。

1 个答案:

答案 0 :(得分:0)

Here's one way to do what you're trying to do:

import pandas as pd
import numpy as np

# Define the cycles
df['y_pos'] = np.where((df.y > 0), True, False)
df['cycle_n'] = (df.y_pos != df.y_pos.shift(1)).cumsum()

# Function to classify states based on x and y
def classify_state(df):
    x_pos = df.x.max() >= 0
    x_neg = df.x.min() < 0
    y_pos = df.y_pos.any()

    if y_pos:
        if x_pos and x_neg:
            state = 'A'
        elif x_pos:
            state = 'B'
        else:
            state = 'C'
    else:
        if x_pos and x_neg:
            state = 'F'
        elif x_pos:
            state = 'D'
        else:
            state = 'E'

    df['state'] = state
    return df

# Apply that function over the cycles
df = df.groupby('cycle_n').apply(classify_state)

# Make the labels and clean up the temporary columns
df['label'] = 'Cycle ' + df.cycle_n.astype(str) + ' ' + df.y_pos.map({True: '(+)', False: '(-)'})
del df['cycle_n']
del df['y_pos']

A few points:

  • As written, the logic works but is a little elaborate. You could almost certainly do it in less lines, but I've left it in long form to make it clear what's going on.
  • Values of 0 are considered to be positive in the code as written, but you can change this by altering (df.iloc[[0, -1], 0] >= 0).

Edit 1: Thanks for your thorough update to your question. It's much clearer what you're looking for now, and I've changed the answer accordingly.

Edit 2: I've altered the code to take into account all df.x values within a cycle, not just the first and last ones.