Pandas concat是否在多变量时间序列中插入缺失值?

时间:2015-06-16 09:32:28

标签: python pandas concatenation time-series

问题(已解决)

为什么使用(Python Pandas)pandas.concat(dfList)插入我的一些值? 我从以下代码中获取events,该代码返回由任何NaN值分隔的DataFrame列表。

# src: http://stackoverflow.com/questions/21402384/how-to-split-a-pandas-time-series-by-nan-values
events = np.split(df, np.where(np.isnan(df['variable13']))[0])
events = [ev[~np.isnan(ev.variable13)] for ev in events if not isinstance(ev, np.ndarray)] # removing NaN entries
events = [ev for ev in events if not ev.empty] # removing empty DataFrames
# 'events' is a list with the events separated by the NaN values.


trimmedDfList = events 
trimmedDf = pd.concat(trimmedDfList)

print "aaaaaa"
print df.isnull().sum()
print "bbbbb"
print trimmedDf.isnull().sum()

# plot
df.plot(subplots=True)
trimmedDf.plot(subplots=True)
plt.show()

给出了以下输出和图:

aaaaaa
variable1   13780
variable2   13780
variable3   13780
variable4   13780
variable5   13780
variable6   13780
variable7   13780
variable8   13780
variable9   13780
variable10  13780
variable11  13780
variable12  13780
variable13  12969
variable14  12969
variable15  12969
variable16  12969
variable17  12969
variable18  12969
dtype: int64

bbbbb
variable1   811
variable2   811
variable3   811
variable4   811
variable5   811
variable6   811
variable7   811
variable8   811
variable9   811
variable10  811
variable11  811
variable12  811
variable13  0
variable14  0
variable15  0
variable16  0
variable17  0
variable18  0
dtype: int64

enter image description here enter image description here

突出显示的插值是不需要的,我该如何避免呢?我只想用NaN填补空白。

解决:

最后,trimmedDf DataFrame(第二个数字图)实际上并未进行插值。重点是:df具有NaN值的索引,同时trimmedDf在突出显示的间隙中没有索引...这就是pd.plot()显示插值的原因。

然后我使用了trimmedDf = trimmedDf.resample('1s').fillna(0)

0 个答案:

没有答案