为什么使用(Python Pandas)pandas.concat(dfList)
插入我的一些值?
我从以下代码中获取events
,该代码返回由任何NaN值分隔的DataFrame列表。
# src: http://stackoverflow.com/questions/21402384/how-to-split-a-pandas-time-series-by-nan-values
events = np.split(df, np.where(np.isnan(df['variable13']))[0])
events = [ev[~np.isnan(ev.variable13)] for ev in events if not isinstance(ev, np.ndarray)] # removing NaN entries
events = [ev for ev in events if not ev.empty] # removing empty DataFrames
# 'events' is a list with the events separated by the NaN values.
trimmedDfList = events
trimmedDf = pd.concat(trimmedDfList)
print "aaaaaa"
print df.isnull().sum()
print "bbbbb"
print trimmedDf.isnull().sum()
# plot
df.plot(subplots=True)
trimmedDf.plot(subplots=True)
plt.show()
给出了以下输出和图:
aaaaaa
variable1 13780
variable2 13780
variable3 13780
variable4 13780
variable5 13780
variable6 13780
variable7 13780
variable8 13780
variable9 13780
variable10 13780
variable11 13780
variable12 13780
variable13 12969
variable14 12969
variable15 12969
variable16 12969
variable17 12969
variable18 12969
dtype: int64
bbbbb
variable1 811
variable2 811
variable3 811
variable4 811
variable5 811
variable6 811
variable7 811
variable8 811
variable9 811
variable10 811
variable11 811
variable12 811
variable13 0
variable14 0
variable15 0
variable16 0
variable17 0
variable18 0
dtype: int64
突出显示的插值是不需要的,我该如何避免呢?我只想用NaN填补空白。
最后,trimmedDf
DataFrame(第二个数字图)实际上并未进行插值。重点是:df
具有NaN值的索引,同时trimmedDf
在突出显示的间隙中没有索引...这就是pd.plot()
显示插值的原因。
然后我使用了trimmedDf = trimmedDf.resample('1s').fillna(0)
。