我有一个时间序列数据,其中包含几年中每个月的天数,并试图创建一个新的数据框,该数据框以月为行,以年为列。
我有这个
DateTime Days Month Year
2004-11-30 3 November 2004
2004-12-31 16 December 2004
2005-01-31 12 January 2005
2005-02-28 11 February 2005
2005-03-31 11 March 2005
... ... ... ...
2019-06-30 0 June 2019
2019-07-31 2 July 2019
2019-08-31 5 August 2019
2019-09-30 5 September 2019
2019-10-31 3 October 2019
我正试图得到这个
Month 2004 2005 ... 2019
January nan 12 7
February nan 11 9
...
November 17 17 nan
December 14 15 nan
我创建了一个新数据框,其第一列表示月份,并尝试遍历第一个数据框以将新列(年)和信息添加到单元格中,但条件是检查第一个数据框中的月份(天)是否与新数据框中的月份匹配(输出)从不为True,因此新数据框永远不会更新。我想这是因为以天为单位的月永远不会与同一迭代中的输出月相同。
for index, row in days.iterrows():
print(days.loc[index, 'Days']) #this prints out as expected
for month in output.items():
print(index.month_name()) #this prints out as expected
if index.month_name()==month:
output.at[month, index.year]=days.loc[index, 'Days'] #I wanted to use this to fill up the cells, is this right?
print(days.loc[index, 'Days']) #this never gets printed out
您能告诉我如何解决此问题吗?还是有一种更好的方法来完成结果,而不是进行迭代? 这是我第一次尝试在python中使用库,因此,我将不胜感激。
答案 0 :(得分:0)
如果您输入的数据框每月和每年都有一个值,请使用pivot
:
df.pivot('Month', 'Year', 'Days')
输出:
Year 2004 2005 2019
Month
August NaN NaN 5
December 16 NaN NaN
February NaN 11 NaN
January NaN 12 NaN
July NaN NaN 2
June NaN NaN 0
March NaN 11 NaN
November 3 NaN NaN
October NaN NaN 3
September NaN NaN 5