我对python和编码非常陌生。我有1分钟间隔的一些数据,可能会遗漏一些时间。我想填写相应数据点的缺失时间和NaN值。这就是我到目前为止所有数据点而不仅仅是缺失的NaN。
import pandas as pd
df = pd.read_csv("data3.csv", index_col="DateTime")
df = df.reindex(pd.date_range("11-1-2014 12:00:00", "11-1-2014 12:10:00", freq="1min"), fill_value="NaN")
df.to_csv("test3.csv")
正在阅读的文件
NSERC_CB04_A0401
DateTime
11/1/2014 0:00 1.121889
11/1/2014 0:01 1.121889
11/1/2014 0:02 1.121889
11/1/2014 0:03 1.121889
11/1/2014 0:04 1.118503
11/1/2014 0:05 1.121889
11/1/2014 0:06 1.121889
11/1/2014 0:07 1.121889
11/1/2014 0:09 1.121889
11/1/2014 0:10 1.121889
正在撰写文件
NSERC_CB04_A0401
2014-11-01 12:00:00 NaN
2014-11-01 12:01:00 NaN
2014-11-01 12:02:00 NaN
2014-11-01 12:03:00 NaN
2014-11-01 12:04:00 NaN
2014-11-01 12:05:00 NaN
2014-11-01 12:06:00 NaN
2014-11-01 12:07:00 NaN
2014-11-01 12:08:00 NaN
2014-11-01 12:09:00 NaN
2014-11-01 12:10:00 NaN
我想要的是什么:
NSERC_CB04_A0401
DateTime
11/1/2014 0:00 1.121889
11/1/2014 0:01 1.121889
11/1/2014 0:02 1.121889
11/1/2014 0:03 1.121889
11/1/2014 0:04 1.118503
11/1/2014 0:05 1.121889
11/1/2014 0:06 1.121889
11/1/2014 0:07 1.121889
2014-11-01 12:08:00 NaN
11/1/2014 0:09 1.121889
11/1/2014 0:10 1.121889
答案 0 :(得分:4)
没问题,你是编码和python中的新手!
您需要将参数parse_dates=True
添加到read_csv
以便先将index
转换为DatetimIndex
,然后再转换为reindex
- 来自11-1-2014 12:00:00
的开始时间已更改到11-1-2014 00:00:00
匹配,类似的结束时间。
同样字符串NaN
没有缺失值,您需要np.nan
reindex
中缺少数据的默认值。
df = pd.read_csv("data3.csv", index_col="DateTime", parse_dates=True)
df = df.reindex(pd.date_range("11-1-2014 00:00:00", "11-1-2014 00:10:00", freq="1min"))
print (df)
NSERC_CB04_A0401
2014-11-01 00:00:00 1.121889
2014-11-01 00:01:00 1.121889
2014-11-01 00:02:00 1.121889
2014-11-01 00:03:00 1.121889
2014-11-01 00:04:00 1.118503
2014-11-01 00:05:00 1.121889
2014-11-01 00:06:00 1.121889
2014-11-01 00:07:00 1.121889
2014-11-01 00:08:00 NaN
2014-11-01 00:09:00 1.121889
2014-11-01 00:10:00 1.121889
更一般的解决方案是reindex
和min
max
datetime
,但这取决于您的数据:
df = df.reindex(pd.date_range(df.index.min(), df.index.max(), freq="1min"))
print (df)
NSERC_CB04_A0401
2014-11-01 00:00:00 1.121889
2014-11-01 00:01:00 1.121889
2014-11-01 00:02:00 1.121889
2014-11-01 00:03:00 1.121889
2014-11-01 00:04:00 1.118503
2014-11-01 00:05:00 1.121889
2014-11-01 00:06:00 1.121889
2014-11-01 00:07:00 1.121889
2014-11-01 00:08:00 NaN
2014-11-01 00:09:00 1.121889
2014-11-01 00:10:00 1.121889
如果索引解决方案中的重复项为resample
,并且包含mean
,sum
等一些汇总函数 - 也是resample docs:
print (df)
NSERC_CB04_A0401
DateTime
2014-11-01 00:00:00 1.121889
2014-11-01 00:01:00 1.121889
2014-11-01 00:02:00 1.121889
2014-11-01 00:03:00 1.121889
2014-11-01 00:04:00 1.118503
2014-11-01 00:05:00 1.121889
2014-11-01 00:06:00 1.121889
2014-11-01 00:07:00 1.121889 <- duplicates index
2014-11-01 00:07:00 1.121889 <- duplicates index
2014-11-01 00:09:00 1.121889
2014-11-01 00:10:00 1.121889
df = df.resample('1min').mean()
print (df)
NSERC_CB04_A0401
DateTime
2014-11-01 00:00:00 1.121889
2014-11-01 00:01:00 1.121889
2014-11-01 00:02:00 1.121889
2014-11-01 00:03:00 1.121889
2014-11-01 00:04:00 1.118503
2014-11-01 00:05:00 1.121889
2014-11-01 00:06:00 1.121889
2014-11-01 00:07:00 1.121889
2014-11-01 00:08:00 NaN
2014-11-01 00:09:00 1.121889
2014-11-01 00:10:00 1.121889