Pandas,如何使用iterrow(),itertuple(),索引和查找峰顶,趋势变化

时间:2018-05-04 13:14:52

标签: python pandas loops dataframe

高峰期

使用X&的状态Y找到异常,其中X的值已达到峰值。

在数据框架中抓取异常的数据子集。例如,异常前5行和后5行。

异常也可能是全球趋势中本地趋势的起点。基本上,从数据帧中获取时间序列的子序列并查看此局部趋势以获取更多信息,特别是确认当地趋势的信号没有逆转。

通过确认X值是@最高点(即振荡值)来识别和验证局部趋势。它也像直方图的中心值。我们需要通过前后值确认X峰值是出租值还是X峰值。理想情况下,我们希望在之前和之后确认一些值。

示例数据

df = pd.DataFrame({
    'X': [-0.27, -0.28, -0.33, -0.37, -0.60, -0.90, -0.99, -0.94, -0.85, -0.75, -0.64, -0.51, -0.35, -0.21, 1.78, 1.98, 2.08, 2.42, 2.56, 2.51, 2.57, 2.53, 2.37, 2.24, 2.11, 2.01, 1.82, 1.64, ],
    'X_State': ['3', '3', '3', '3', '5', '5', '5', '5', '5', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '6', '6', '6', '6', '6', ],
    'Y_State': ['23', '23', '23', '23', '24', '24', '24', '24', '24', '23', '23', '23', '22', '22', '18', '18', '18', '17', '17', '18', '17', '17', '18', '18', '18', '18', '18', '19', ],
})

df2 = pd.DataFrame() #create new empty dataframe

第二个数据帧用于存储我们找到的子集数据。

代码

Label = []  

# Get Previous  
df['X_STATE_Previous_Value'] = df.X_State.shift(1)   
df['Y_STATE_Previous_Value'] = df.Y_State.shift(1)  
df['Y_STATE_Change'] = (df.Y_State.ne(df.Y_State.shift())).astype(int)  

for index, row in df.iterrows():   
    if (row['Y_State'] == '17' and row['Y_STATE_Previous_Value'] == '18'):  
        Label.append('Index Position: ' + str(index))  
        # Select 5 rows before and after  
        df2 = df2.append(df.iloc[index-5:index+5])  

        # Find where X peaked  
        for i, row2 in df2.iterrows():  
            # get index position of the first instance of the largest value  
            peak = df2.X.idxmax()  

        # Go back and label where X peaked 
        df.loc[peak, 'Label'] = 'Top of Peak'  

    else:  
        Label.append('...')  

df['Label'] = Label  
df2['Max_Label'] = peak  

print(df)  
print(df2)  
#del df2  

需要帮助

首先。顶部标记不更新df,即使它被引用为df。它正在更新df2,最后df2只是暂时的,以帮助我们找到峰值。

其次,寻找确定Top of Peak的更好方法。在子集中使用max的值,这实际上并不确认前后的值,因为它们都是出租人。

1 个答案:

答案 0 :(得分:0)

如果我理解的话,我会如何做你想要的事情:

>>> re.match(r'^([\s\d]+)$', text)

请告诉我它是否适用于您所寻找的内容。

对于子集max,

编辑,您可以执行以下操作:

import pandas as pd
df = pd.DataFrame({
    'X': [-0.27, -0.28, -0.33, -0.37, -0.60, -0.90, -0.99, -0.94, -0.85, -0.75, -0.64, -0.51, -0.35, -0.21, 1.78, 1.98, 2.08, 2.42, 2.56, 2.51, 2.57, 2.53, 2.37, 2.24, 2.11, 2.01, 1.82, 1.64, ],
    'X_State': ['3', '3', '3', '3', '5', '5', '5', '5', '5', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '6', '6', '6', '6', '6', ],
    'Y_State': ['23', '23', '23', '23', '24', '24', '24', '24', '24', '23', '23', '23', '22', '22', '18', '18', '18', '17', '17', '18', '17', '17', '18', '18', '18', '18', '18', '19', ],
})

df['X_STATE_Previous_Value'] = df.X_State.shift(1)   
df['Y_STATE_Previous_Value'] = df.Y_State.shift(1)  
df['Y_STATE_Change'] = (df.Y_State.ne(df.Y_State.shift())).astype(int)  

df['Label'] = '' #or '...' if you like better

# get a list of indexes where abnormality:
abnormal_idx = df[(df['Y_State'] == '17') & (df['Y_STATE_Previous_Value'] == '18')].index
# write it in column Label:
df.loc[abnormal_idx ,'Label'] = 'abnormality'
# get a subset of +/- 5 rows around abnormalities
df2 = df[min(abnormal_idx )-5:max(abnormal_idx )+5]
# and the max of X on this subset
peak_idx = df2.X.idxmax()
# you don't really df2, you can do directly: peak_idx = df[min(abnormal_idx )-5:max(abnormal_idx )+5].X.idxmax()
# add this number in a column, not sure why?
df['Max_Label'] = peak_idx