我使用的是Python 2.7。标题提供了上下文。我用这种特定的方式表达了标题,以便人们可以在将来查询这个堆栈交换问题。使用MATLAB有很多关于这个东西的文档,但是这个过程严重缺乏Scipy,NumPy,Pandas,matplotlib等。
基本上,我有以下数据框:
time amplitude
0 1.0 0.1
1 2.0 -0.3
2 3.0 1.4
3 4.0 4.2
4 5.0 -5.7
5 6.0 2.3
6 7.0 -0.2
7 8.0 -0.3
8 9.0 1.0
9 10.0 0.1
现在我要做的是以下内容:
在适当的位置将值附加到数据框中,即
time amplitude upper lower
0 1.0 0.1
1 2.0 -0.3
2 3.0 1.4
3 4.0 4.2 4.2
4 5.0 -5.7 -5.7
5 6.0 2.3 2.3
6 7.0 -0.8 -0.8
7 8.0 -0.3
8 9.0 1.0
9 10.0 0.1
在最大值和最小值之间插值以清除数据帧
绘制振幅列,上部列和下部列
我对python / pandas非常熟悉,并想象代码如下所示:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import scipy as scipy
time = [0,1,2,3,4,5,6,7,8,9]
amplitude = [0.1,-0.3,1.4,4.2,-5.7,2.3,-0.2,-0.3,1.0,0.1]
df = pd.DataFrame({'time': time, 'amplitude': amplitude}]
plt.plot(df['time'],df['amplitude])
for seconds in time:
if <interval == 5>:
max = []
time_max = []
min = []
time_min = []
max.append(df.max['amplitude'])
min.append(df.min['amplitude'])
time_max.append(<time value in interval>)
time_min.append(<time value in interval>)
<build another dataframe>
<concat to existing dataframe df>
<interpolate between values in column 'upper'>
<interpolate between values in column 'lower'>
感谢任何帮助。
谢谢。〜德文
答案 0 :(得分:0)
Pandas resample()
和interpolate()
会有所帮助。要获得DatetimeIndex
的秒数,请从任意Datetime
开始 - 当您完成时,您可以随时砍掉年份:
df.set_index(pd.to_datetime("2017") + df.time * pd.offsets.Second(), inplace=True)
print(df)
time amplitude
time
2017-01-01 00:00:01 1.0 0.1
2017-01-01 00:00:02 2.0 -0.3
2017-01-01 00:00:03 3.0 1.4
2017-01-01 00:00:04 4.0 4.2
2017-01-01 00:00:05 5.0 -5.7
2017-01-01 00:00:06 6.0 2.3
2017-01-01 00:00:07 7.0 -0.2
2017-01-01 00:00:08 8.0 -0.3
2017-01-01 00:00:09 9.0 1.0
2017-01-01 00:00:10 10.0 0.1
每5秒重新采样一次,获取摘要统计信息min
和max
:
summary = (df.resample('5S', label='right', closed='right')
.agg({"amplitude":{"lower":"min","upper":"max"}}))
summary.columns = summary.columns.droplevel(0)
print(summary)
upper lower
time
2017-01-01 00:00:05 4.2 -5.7
2017-01-01 00:00:10 2.3 -0.3
与原始df
合并并插入缺失值。 (请注意,只能在两个值之间进行插值,因此前几个条目将为NaN
。)
df2 = df.merge(summary, how='left', left_index=True, right_index=True)
df2.lower.interpolate(inplace=True)
df2.upper.interpolate(inplace=True)
print(df2)
time amplitude upper lower
time
2017-01-01 00:00:01 1.0 0.1 NaN NaN
2017-01-01 00:00:02 2.0 -0.3 NaN NaN
2017-01-01 00:00:03 3.0 1.4 NaN NaN
2017-01-01 00:00:04 4.0 4.2 NaN NaN
2017-01-01 00:00:05 5.0 -5.7 4.20 -5.70
2017-01-01 00:00:06 6.0 2.3 3.82 -4.62
2017-01-01 00:00:07 7.0 -0.2 3.44 -3.54
2017-01-01 00:00:08 8.0 -0.3 3.06 -2.46
2017-01-01 00:00:09 9.0 1.0 2.68 -1.38
2017-01-01 00:00:10 10.0 0.1 2.30 -0.30
最后,绘制输出:
plot_cols = ['amplitude','lower','upper']
df2[plot_cols].plot()
注意:如果您希望索引仅显示秒数,请使用:
df2.index = df2.index.second
答案 1 :(得分:0)
也使用了这个:Subsetting Data Frame into Multiple Data Frames in Pandas
我第一次遇到这个问题:
我希望这可以帮助人们为嘈杂的信号/时间序列数据创建任意信封,就像它帮助我一样!!!!
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import scipy as scipy
time_array = [0,1,2,3,4,5,6,7,8,9]
value_array = [0.1,-0.3,1.4,4.2,-5.7,2.3,-0.2,-0.3,1.0,0.1]
upper_time = []
upper_value = []
lower_time = []
lower_value = []
df = pd.DataFrame({'time': time_array, 'value': value_array})
for element,df_k in df.groupby(lambda x: x/2):
df_temp = df_k.reset_index(drop=True)
upper_time.append(df_temp['time'].loc[df_temp['value'].idxmax()])
upper_value_raw = df_temp['value'].loc[df_temp['value'].idxmax()]
upper_value.append(round(upper_value_raw,1))
lower_time.append(df_temp['time'].loc[df_temp['value'].idxmin()])
lower_value_raw = df_temp['value'].loc[df_temp['value'].idxmin()]
lower_value.append(round(lower_value_raw,1))
plt.plot(df['time'],df['value'])
plt.plot(upper_time,upper_value)
plt.plot(lower_time,lower_value)
plt.show()