在熊猫的时间序列图中查找模式

时间:2019-03-08 06:08:16

标签: python pandas data-analysis

我想在熊猫数据框中找到一个图案。真正的问题如下图:

I would like to find the blue pattern in the graph.

我的想法是:

  1. 为我要寻找的内容建立模式模型
  2. 将图案与数据框进行比较,并计算数据框与图案之间的欧几里德距离
  3. 逐步沿图形移动图案并计算每个点的欧几里得距离
  4. 绘制所有欧几里得距离
  5. 欧几里德距离最小的位置是图案位置

这是我的代码(我做一个df示例只是为了尝试一下。原始df太大):

import numpy as np
import pandas as pd
from pandas import Series
from pandas import DataFrame
from sklearn.metrics.pairwise import euclidean_distances
from sklearn.metrics.pairwise import paired_distances
from scipy.spatial.distance import cdist

d = {'Time': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 
18, 19, 20, 21, 22], 
'Value': [0, 1, 1, 1, 2, 1, 1, 2, 3, 2, 1, 1, 1, 1, 1, 2, 1, 1, 3, 4, 1, 2, 3]}
df = pd.DataFrame(data=d)
d2 = {'Time': [0, 1, 2] , 'Value': [1, 2, 3]}
patch = pd.DataFrame(data=d2)

def orig(df, patch):

df['corr'] = np.nan

for i in range(df.shape[0]):

    #select the df window with the same size of patch
    window = df[i : i+patch.shape[0]]

    #If window and patch have different shapes --> Break
    if window.shape[0] != patch.shape[0] :

        break

    else:
        patch.reset_index(inplace=True, drop=True)          
        window.reset_index(inplace=True, drop=True)

        df['corr'] = cdist(df[['Value']], patch[['Value']],'euclidean')

return df

很遗憾,它无法正常工作。要计算欧几里得距离,cdist至少需要2维,但是我只考虑模式(补丁)与实际df之间的差异。如果我仅创建1列以使代码正常工作,我也会得到错误的结果。  任何人都可以给我一个关于如何从另一个数据帧中识别模式的提示吗?也许我正在尝试更艰难的方法。

1 个答案:

答案 0 :(得分:0)

好吧,我修复了您的数据框创建以及函数定义,但是不确定输出是否符合您的期望:

import numpy as np
import pandas as pd
from pandas import Series
from pandas import DataFrame
from sklearn.metrics.pairwise import euclidean_distances
from sklearn.metrics.pairwise import paired_distances
from scipy.spatial.distance import cdist

d = {'col1': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22], 
'col2': [0, 1, 1, 1, 2, 1, 1, 2, 3, 2, 1, 1, 1, 1, 1, 2, 1, 1, 3, 4, 1, 2, 3]}
df = pd.DataFrame(data=d)
d2 = {'col1': [0, 1, 2] , 'col2': [1, 2, 3]}
patch = pd.DataFrame(data=d2)

def orig(df, patch):

    df['corr'] = np.nan

    for i in range(df.shape[0]):

        #select the df window with the same size of patch
        window = df[i : i+patch.shape[0]]

        #If window and patch have different shapes --> Break
        if window.shape[0] != patch.shape[0] :

            break

        else:
            patch.reset_index(inplace=True, drop=True)          
            window.reset_index(inplace=True, drop=True)

            df['corr'] = cdist(df[['col2']], patch[['col2']],'euclidean')

    return df

orig(df, patch)