我想在熊猫数据框中找到一个图案。真正的问题如下图:
I would like to find the blue pattern in the graph.
我的想法是:
这是我的代码(我做一个df示例只是为了尝试一下。原始df太大):
import numpy as np
import pandas as pd
from pandas import Series
from pandas import DataFrame
from sklearn.metrics.pairwise import euclidean_distances
from sklearn.metrics.pairwise import paired_distances
from scipy.spatial.distance import cdist
d = {'Time': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,
18, 19, 20, 21, 22],
'Value': [0, 1, 1, 1, 2, 1, 1, 2, 3, 2, 1, 1, 1, 1, 1, 2, 1, 1, 3, 4, 1, 2, 3]}
df = pd.DataFrame(data=d)
d2 = {'Time': [0, 1, 2] , 'Value': [1, 2, 3]}
patch = pd.DataFrame(data=d2)
def orig(df, patch):
df['corr'] = np.nan
for i in range(df.shape[0]):
#select the df window with the same size of patch
window = df[i : i+patch.shape[0]]
#If window and patch have different shapes --> Break
if window.shape[0] != patch.shape[0] :
break
else:
patch.reset_index(inplace=True, drop=True)
window.reset_index(inplace=True, drop=True)
df['corr'] = cdist(df[['Value']], patch[['Value']],'euclidean')
return df
很遗憾,它无法正常工作。要计算欧几里得距离,cdist至少需要2维,但是我只考虑模式(补丁)与实际df之间的差异。如果我仅创建1列以使代码正常工作,我也会得到错误的结果。 任何人都可以给我一个关于如何从另一个数据帧中识别模式的提示吗?也许我正在尝试更艰难的方法。
答案 0 :(得分:0)
好吧,我修复了您的数据框创建以及函数定义,但是不确定输出是否符合您的期望:
import numpy as np
import pandas as pd
from pandas import Series
from pandas import DataFrame
from sklearn.metrics.pairwise import euclidean_distances
from sklearn.metrics.pairwise import paired_distances
from scipy.spatial.distance import cdist
d = {'col1': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22],
'col2': [0, 1, 1, 1, 2, 1, 1, 2, 3, 2, 1, 1, 1, 1, 1, 2, 1, 1, 3, 4, 1, 2, 3]}
df = pd.DataFrame(data=d)
d2 = {'col1': [0, 1, 2] , 'col2': [1, 2, 3]}
patch = pd.DataFrame(data=d2)
def orig(df, patch):
df['corr'] = np.nan
for i in range(df.shape[0]):
#select the df window with the same size of patch
window = df[i : i+patch.shape[0]]
#If window and patch have different shapes --> Break
if window.shape[0] != patch.shape[0] :
break
else:
patch.reset_index(inplace=True, drop=True)
window.reset_index(inplace=True, drop=True)
df['corr'] = cdist(df[['col2']], patch[['col2']],'euclidean')
return df
orig(df, patch)