Question

我有一个pandas系列，其中索引是日期时间。

我可以使用step()函数绘制我的函数，该函数将系列的每个点相对于时间（x是时间）绘制。

我想要一种不那么精确的进化方法。所以我需要减少步数，并忽略较小的增量。我找到的唯一方法是使用numpy中的poly1d()函数将点插值为多项式，然后逐步执行该函数。不幸的是，我在转换期间失去了时间索引，因为多项式的索引是x。

有没有办法'简化'我的功能只能得到y轴上最大变化的日期（x值）而不是任何变化的所有日期？正如我上面所写的那样，我希望只有最大的增量，而不是微小的变化。

以下是确切的数据：

2016-01-02    -5.418440
2016-01-09    -9.137942
2016-01-16    -9.137942
2016-01-23    -9.137942
2016-01-30    -9.137942
2016-02-06   -11.795107
2016-02-13   -11.795107
2016-02-20   -11.795107
2016-02-27   -11.795107
2016-03-05   -11.795107
2016-03-12   -13.106988
2016-03-19   -13.106988
2016-03-26   -13.106988
2016-04-02   -13.106988
2016-04-09   -13.106988
2016-04-16   -13.106988
2016-04-23   -13.106988
2016-04-30   -11.458878
2016-05-07     0.051123
2016-05-14     2.010179
2016-05-21    -3.210870
2016-05-28    -0.726291
2016-06-04     5.841818
2016-06-11     5.067061
2016-06-18     5.789375
2016-06-25    16.455159
2016-07-02    22.518294
2016-07-09    39.834977
2016-07-16    54.685965
2016-07-23    54.685965
2016-07-30    55.169290
2016-08-06    55.169290
2016-08-13    55.169290
2016-08-20    53.366569
2016-08-27    45.758675
2016-09-03    10.976592
2016-09-10    -0.554887
2016-09-17    -8.653451
2016-09-24   -18.198305
2016-10-01   -22.218711
2016-10-08   -21.158434
2016-10-15   -11.723798
2016-10-22    -9.928957
2016-10-29   -17.498315
2016-11-05   -22.850454
2016-11-12   -25.190656
2016-11-19   -27.250960
2016-11-26   -27.250960
2016-12-03   -27.250960
2016-12-10   -27.250960

Answer 1

所以这是我的想法：

# Load the data
a = load_table('<your_data_file>', delim_whitespace=True, names=['value'], index_col=0)

# Create and additional column containing the difference 
#+between two consecutive values:
a['diff'] = a.value.diff()

# select only the value of the 'diff' column higher than a certain threshold
#+and copy them to a new frame:
b = a[abs(a['diff']) > .5] # The threshold (.5) could be what you think is the best

# Plot your new graph
b.value.plot()

希望这有用......

Answer 2

一种方法是从原始系列创建一个遮罩，其中将系列中前一个值的绝对值与灵敏度阈值进行比较。掩码只是一个布尔选择数组（矩阵），用于过滤原始系列。

#my_series is your Series
threshold = 10.0
diff_series = my_series.diff.abs()
mask = diff_series > threshold
#now plot the masked values only or create new series from it etc.
my_series[mask].plot()

Answer 3

您可以使用pandas resample function。

导入数据并将列设置为＆＃39;日期＆＃39;和＆＃39;价值观＆＃39;。其余部分将Date列解析为datetime。

import pandas as pd
from datetime import datetime

df.columns = ['Date','Values']
df.Date = df.Date.map(lambda x: datetime.strptime(x,'%Y-%m-%d'))
df.set_index('Date',inplace=True)

您现在可以重新取样时间序列。例如，按月：

resampled_df = df.resample('M').mean()
resampled_df.head()

最后，绘制它。

resampled_df.plot()

过滤/平滑步进功能以检索最大增量

3 个答案: