我要处理“突发”数据-具有突发的时间序列。数据可能非常嘈杂。我真的只对突发持续时间感兴趣,但是我的突发检测算法仅在数据没有斜率的情况下才真正起作用。现在我的问题是:如何在不手动进行处理的情况下找到此类数据的线性斜率?我的主要问题是,可能会出现超出我的time(x)轴两端的脉冲。否则,我可能只能找到第一个和最后20个数据点的平均值并拟合线性函数。 基本上我想在下面的图片中找到红线并将其减去。我猜想通过爆发或倾斜基线进行线性回归可以解决问题,但我莫名其妙地陷入困境。
代码:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from matplotlib import rcParams,pylab
from matplotlib.patches import Rectangle
import random
#set plot properties
sns.set_style("white")
rcParams['font.size'] = 14
FIG_SIZE=(12, 15)
# Simulate data
timepoints = 4000
r = pd.Series(np.floor(np.ones(timepoints)*20 + np.random.normal(scale=10,
size=timepoints))) #target events
r[r<0] = 0 #set negative values to 0
# #add some bursts to the data
heights = [35,45,50,55,40,60,70]
starts =[100,300,700,950,1200,1800,2100,2550,2800,3100,3500,3800]
ends=[200,400,800,1100,1500,1900,2400,2625,2950,3350,3700,4000]
for x,y in zip(starts,ends):
r[x:y] = r[x:y] + random.choice(heights) #+ np.random.normal(scale=10, size=200)
# add linear slope to data
slope=0.02
linear_slope=np.arange(timepoints) *slope
burst_data_with_slope= r + linear_slope
#Fig setup
fig, (ax1, ax2) = plt.subplots(2, figsize=FIG_SIZE,sharey=False)
#
ax1.set_ylabel('proportion of target events', size=14)
ax1.set_xlabel('time (sec)', size=14)
ax2.set_xlabel('time (sec)', size=14)
ax2.set_ylabel('proportion of target events', size=14)
ax1.set_xlim([0, timepoints])
ax2.set_xlim([0, timepoints])
ax1.plot(burst_data_with_slope, color='#00bbcc', linewidth=1)
ax1.set_title('Original Data', size=14)
ax2.plot(burst_data_with_slope, color='#00bbcc', linewidth=1)
ax2.plot(linear_slope, color='red', linewidth=1)
ax2.set_title('Original Data substracted slope', size=14)
# Finaly plot
plt.subplots_adjust(hspace=0.5)
plt.show()
谢谢!