熊猫系列重采样

时间:2020-07-30 20:14:43

标签: python pandas pandas-resample

我有以下熊猫系列:

    dummy_array = pd.Series(np.array(range(-10, 11)), index=(np.array(range(0, 21))/10))

这将产生以下数组:

0.0   -10
0.1    -9
0.2    -8
0.3    -7
0.4    -6
0.5    -5
0.6    -4
0.7    -3
0.8    -2
0.9    -1
1.0     0
1.1     1
1.2     2
1.3     3
1.4     4
1.5     5
1.6     6
1.7     7
1.8     8
1.9     9
2.0    10

如果我想重新采样,该怎么办?我阅读了文档,并提出了以下建议:

    dummy_array.resample('20S').mean()

但是它不起作用。有什么想法吗?

谢谢。

编辑:

我希望我的最终向量具有两倍的频率。像这样:

0.0   -10
0.05   -9.5
0.1    -9
0.15    -8.5
0.2    -8
0.25    -7.5
etc.

3 个答案:

答案 0 :(得分:2)

这是使用np.linspace().reindex()interpolate的解决方案:

如上所述创建数据帧dummmy_array

# get properties of original index
start = dummy_array.index.min()
end = dummy_array.index.max()
num_gridpoints_orig = dummy_array.index.size

# calc number of grid-points in new index
num_gridpoints_new = (num_gridpoints_orig  * 2) - 1 

# create new index, with twice the number of grid-points (i.e., smaller step-size)
idx_new = np.linspace(start, end, num_gridpoints_new)

# re-index the data frame.  New grid-points have value of NaN,
# and we replace these NaNs with interpolated values
df2 = dummy_array.reindex(index=idx_new).interpolate()

print(df2.head())

0.00   -10.0
0.05    -9.5
0.10    -9.0
0.15    -8.5
0.20    -8.0

答案 1 :(得分:0)

基于原始数组创建差异列表。然后,我们将其分解为值和索引,以创建“ pd.Series”。加入新的pd.series并重新排序。

# new list
ups = [[x+0.05,y+0.5] for x,y in zip(dummy_array.index, dummy_array)]
idx = [i[0] for i in ups]
val = [i[1] for i in ups]
d2 = pd.Series(val, index=idx)
d3 = pd.concat([dummy_array,d2], axis=0)
d3.sort_values(inplace=True)

d3
0.00   -10.0
0.05    -9.5
0.10    -9.0
0.15    -8.5
0.20    -8.0
0.25    -7.5
0.30    -7.0
0.35    -6.5
0.40    -6.0
0.45    -5.5
0.50    -5.0
0.55    -4.5
0.60    -4.0
0.65    -3.5
0.70    -3.0
0.75    -2.5
0.80    -2.0
0.85    -1.5
0.90    -1.0
0.95    -0.5
1.00     0.0
1.05     0.5
1.10     1.0
1.15     1.5
1.20     2.0
1.25     2.5
1.30     3.0
1.35     3.5
1.40     4.0
1.45     4.5
1.50     5.0
1.55     5.5
1.60     6.0
1.65     6.5
1.70     7.0
1.75     7.5
1.80     8.0
1.85     8.5
1.90     9.0
1.95     9.5
2.00    10.0
2.05    10.5
dtype: float64

答案 2 :(得分:0)

谢谢大家的贡献。在查看了答案并进行了更多思考之后,我发现了一种更通用的解决方案,可以处理所有可能的情况。在这种情况下,我想将dummy_arrayA上采样到与dummy_arrayB相同的索引。我要做的是创建一个同时包含A和B的新索引。然后,我使用reindex和interpolate函数来计算什么是新值,最后我放下了旧索引,以便获得相同的数组大小作为dummy_array-B。

import pandas as pd
import numpy as np

# Create Dummy arrays
dummy_arrayA = pd.Series(np.array(range(0, 4)), index=[0,0.5,1.0,1.5])
dummy_arrayB = pd.Series(np.array(range(0, 5)), index=[0,0.4,0.8,1.2,1.6])

# Create new index based on array A
new_ind = pd.Index(dummy_arrayA.index)
# merge index A and B
new_ind=new_ind.union(dummy_arrayB.index)

# Use the reindex function. This will copy all the values and add the missing ones with nan. Then we call the interpolate function with the index method. So that it's interpolates based on the time.
df2 = dummy_arrayA.reindex(index=new_ind).interpolate(method="index")

# Delete the points.
New_ind_inter = dummy_arrayA.index.intersection(new_ind)
# We need to prevent that common point are also deleted.
new_ind = new_ind.difference(New_ind_inter)

# Delete the old points. So that the final array matchs dummy_arrayB
df2 = df2.drop(new_ind)

print(df2)