我正在使用pandas set_index函数遇到奇怪的行为。我最初有这个数据框:
Unnamed: 0 Timestamps PM10
0 NaN NaT PM10
1 NaN NaT ▒g/m▒
2 NaN 2018-12-31 23:00:00 10.76
3 NaN 2018-12-31 22:00:00 9.46
4 NaN 2018-12-31 21:00:00 8.67
... ... ... ...
8682 NaN 2018-01-01 04:00:00 25.14
8683 NaN 2018-01-01 03:00:00 31.34
8684 NaN 2018-01-01 02:00:00 36.28
8685 NaN 2018-01-01 01:00:00 21.78
8686 NaN 2018-01-01 00:00:00 20.59
我想删除前两行并将Timestamps设置为indeces,所以我这样做:
df_final = df.drop([0,1]).set_index('Timestamps', drop=True)
我得到这个数据框:
Unnamed: 0 PM10
Timestamps
2018-12-31 23:00:00 NaN 10.76
2018-12-31 22:00:00 NaN 9.46
2018-12-31 21:00:00 NaN 8.67
2018-12-31 20:00:00 NaN 10.42
2018-12-31 19:00:00 NaN 10.04
... ... ...
2018-01-01 04:00:00 NaN 25.14
2018-01-01 03:00:00 NaN 31.34
2018-01-01 02:00:00 NaN 36.28
2018-01-01 01:00:00 NaN 21.78
2018-01-01 00:00:00 NaN 20.59
到目前为止,还不错,但是最后我想用我创建的新时间索引t_index重新索引PM10列,所以我这样做:
data_write = df_final.PM10[-1::-1].reindex(t_index)
那是我得到错误的地方:
TypeError: 'NoneType' object is not iterable
经过一些调试后,我得出结论是set_index引起了此问题,但我不知道为什么,希望能得到任何帮助!
答案 0 :(得分:0)
经过反复试验,我设法使它起作用,下面是执行此操作的代码:
df = df.drop([0,1]).drop("Unnamed: 0", axis=1).set_index('Timestamps', drop=True)
df = df.sort_values(by="Timestamps", ascending=True)
year = 2018
start_index = '{}-01-01 00:00:00'.format(year) # define start of the year
end_index = '{}-12-31 23:00:00'.format(year) # define end of the year
t_index = pd.DatetimeIndex(start=start_index, end=end_index, freq='1h').strftime("%Y-%m-%d %H:%M:%S")
df_final = pd.to_numeric(df.PM10).resample('H').mean().reindex(t_index)
仍然不确定是什么原因导致了错误,或者为什么.asfreq
方法不起作用。