Python中缺少数据点

时间:2017-01-25 15:12:29

标签: python python-3.x pandas

我对python和编码非常陌生。我有1分钟间隔的一些数据,可能会遗漏一些时间。我想填写相应数据点的缺失时间和NaN值。这就是我到目前为止所有数据点而不仅仅是缺失的NaN。

import pandas as pd
df = pd.read_csv("data3.csv", index_col="DateTime")
df = df.reindex(pd.date_range("11-1-2014 12:00:00", "11-1-2014 12:10:00", freq="1min"), fill_value="NaN")
df.to_csv("test3.csv")

正在阅读的文件

                NSERC_CB04_A0401
DateTime                        
11/1/2014 0:00          1.121889
11/1/2014 0:01          1.121889
11/1/2014 0:02          1.121889
11/1/2014 0:03          1.121889
11/1/2014 0:04          1.118503
11/1/2014 0:05          1.121889
11/1/2014 0:06          1.121889
11/1/2014 0:07          1.121889
11/1/2014 0:09          1.121889
11/1/2014 0:10          1.121889

正在撰写文件

                               NSERC_CB04_A0401
2014-11-01 12:00:00              NaN
2014-11-01 12:01:00              NaN
2014-11-01 12:02:00              NaN
2014-11-01 12:03:00              NaN
2014-11-01 12:04:00              NaN
2014-11-01 12:05:00              NaN
2014-11-01 12:06:00              NaN
2014-11-01 12:07:00              NaN
2014-11-01 12:08:00              NaN
2014-11-01 12:09:00              NaN
2014-11-01 12:10:00              NaN

我想要的是什么:

                    NSERC_CB04_A0401
    DateTime                        
    11/1/2014 0:00          1.121889
    11/1/2014 0:01          1.121889
    11/1/2014 0:02          1.121889
    11/1/2014 0:03          1.121889
    11/1/2014 0:04          1.118503
    11/1/2014 0:05          1.121889
    11/1/2014 0:06          1.121889
    11/1/2014 0:07          1.121889
2014-11-01 12:08:00              NaN
    11/1/2014 0:09          1.121889
    11/1/2014 0:10          1.121889

1 个答案:

答案 0 :(得分:4)

没问题,你是编码和python中的新手!

您需要将参数parse_dates=True添加到read_csv以便先将index转换为DatetimIndex,然后再转换为reindex - 来自11-1-2014 12:00:00的开始时间已更改到11-1-2014 00:00:00匹配,类似的结束时间。

同样字符串NaN没有缺失值,您需要np.nan reindex中缺少数据的默认值。

df = pd.read_csv("data3.csv", index_col="DateTime", parse_dates=True)

df = df.reindex(pd.date_range("11-1-2014 00:00:00", "11-1-2014 00:10:00", freq="1min"))
print (df)
                     NSERC_CB04_A0401
2014-11-01 00:00:00          1.121889
2014-11-01 00:01:00          1.121889
2014-11-01 00:02:00          1.121889
2014-11-01 00:03:00          1.121889
2014-11-01 00:04:00          1.118503
2014-11-01 00:05:00          1.121889
2014-11-01 00:06:00          1.121889
2014-11-01 00:07:00          1.121889
2014-11-01 00:08:00               NaN
2014-11-01 00:09:00          1.121889
2014-11-01 00:10:00          1.121889

更一般的解决方案是reindexmin max datetime,但这取决于您的数据:

df = df.reindex(pd.date_range(df.index.min(), df.index.max(), freq="1min"))
print (df)
                     NSERC_CB04_A0401
2014-11-01 00:00:00          1.121889
2014-11-01 00:01:00          1.121889
2014-11-01 00:02:00          1.121889
2014-11-01 00:03:00          1.121889
2014-11-01 00:04:00          1.118503
2014-11-01 00:05:00          1.121889
2014-11-01 00:06:00          1.121889
2014-11-01 00:07:00          1.121889
2014-11-01 00:08:00               NaN
2014-11-01 00:09:00          1.121889
2014-11-01 00:10:00          1.121889

如果索引解决方案中的重复项为resample,并且包含meansum等一些汇总函数 - 也是resample docs

print (df)
                     NSERC_CB04_A0401
DateTime                             
2014-11-01 00:00:00          1.121889
2014-11-01 00:01:00          1.121889
2014-11-01 00:02:00          1.121889
2014-11-01 00:03:00          1.121889
2014-11-01 00:04:00          1.118503
2014-11-01 00:05:00          1.121889
2014-11-01 00:06:00          1.121889
2014-11-01 00:07:00          1.121889 <- duplicates index
2014-11-01 00:07:00          1.121889 <- duplicates index
2014-11-01 00:09:00          1.121889
2014-11-01 00:10:00          1.121889

df = df.resample('1min').mean()
print (df)
                     NSERC_CB04_A0401
DateTime                             
2014-11-01 00:00:00          1.121889
2014-11-01 00:01:00          1.121889
2014-11-01 00:02:00          1.121889
2014-11-01 00:03:00          1.121889
2014-11-01 00:04:00          1.118503
2014-11-01 00:05:00          1.121889
2014-11-01 00:06:00          1.121889
2014-11-01 00:07:00          1.121889
2014-11-01 00:08:00               NaN
2014-11-01 00:09:00          1.121889
2014-11-01 00:10:00          1.121889