我想用日期时间索引的数据框中的缺失行填充所有列中的NA

时间:2020-10-06 10:23:25

标签: python pandas

我有一些代码可以将两个数据帧连接在一起,并用df中的值覆盖df1中的值

import pandas as pd
import numpy as np

#create two dataframes 
df = pd.DataFrame(np.random.randint(0,30,size=(10, 4)), columns=(['Temp', 'Precip', 'Wind', 'Pressure']))
df1 = pd.DataFrame(np.random.randint(0,30,size=(11, 4)), columns=(['Temp', 'Precip', 'Wind', 'Pressure']))

df['Location'] =[2,2,3,3,4,4,5,5,6,6]
df1['Location'] =[0,1,2,2,3,3,4,4,5,5,6,6]

#create two different indices  which overlap
df.index = ["2020-05-18 12:00:00","2020-05-19 12:00:00","2020-05-18 12:00:00","2020-05-19 12:00:00","2020-05-18 12:00:00","2020-05-19 12:00:00","2020-05-18 12:00:00","2020-05-19 12:00:00","2020-05-18 12:00:00","2020-05-19 12:00:00"]
df1.index = ["2020-05-20 12:00:00","2020-05-20 12:00:00", "2020-05-20 12:00:00", "2020-05-19 12:00:00", "2020-05-20 12:00:00", "2020-05-19 12:00:00", "2020-05-20 12:00:00", "2020-05-19 12:00:00", "2020-05-20 12:00:00", "2020-05-19 12:00:00", "2020-05-20 12:00:00", "2020-05-19 12:00:00"]

#make the datetime the index
df.index = pd.to_datetime(df.index)
df1.index = pd.to_datetime(df1.index)

数据帧如下:

df

enter image description here

df1

enter image description here

df具有位置2、3、4、5和6的值,而df1具有位置0、1、2、3、4、5、6的值。当我使用cumcount和Combine_first将它们组合在一起时,它会删除位置0和1的值,因为它们的长度比其他位置短。

#merge and overwrite
df = df.set_index(df.groupby(level=0).cumcount(), append=True)
df1 = df1.set_index(df1.groupby(level=0).cumcount(), append=True)

df = df.combine_first(df1).sort_index(level=[1,0]).reset_index(level=1, drop=True)
df

enter image description here

这通过消除每个位置的日期顺序,并将位置0和1单独放在数据框中来与索引混淆。

我想要一个数据帧,其位置0和1分别具有第18、19和20号,并且具有这些值的NA值,如下所示: enter image description here

非常感谢您的帮助。谢谢。

1 个答案:

答案 0 :(得分:0)

一个想法应该是仅将Location附加到MuliIndex,然后使用unstack方法通过stack添加丢失的行:

df = df.set_index(['Location'], append=True)
df1 = df1.set_index(['Location'], append=True)

df = (df.combine_first(df1)
        .unstack()
        .stack(dropna=False)
        .sort_index(level=[1,0])
        .reset_index(level=1))
        
print (df)
                     Location  Temp  Precip  Wind  Pressure
2020-05-18 12:00:00         0   NaN     NaN   NaN       NaN
2020-05-19 12:00:00         0   NaN     NaN   NaN       NaN
2020-05-20 12:00:00         0  18.0    10.0   2.0      29.0
2020-05-18 12:00:00         1   NaN     NaN   NaN       NaN
2020-05-19 12:00:00         1   NaN     NaN   NaN       NaN
2020-05-20 12:00:00         1  13.0    11.0  27.0      16.0
2020-05-18 12:00:00         2  17.0    13.0   6.0      23.0
2020-05-19 12:00:00         2  12.0     8.0  18.0      21.0
2020-05-20 12:00:00         2   5.0    29.0  15.0      22.0
2020-05-18 12:00:00         3  23.0     2.0  12.0       5.0
2020-05-19 12:00:00         3   1.0     2.0  18.0      20.0
2020-05-20 12:00:00         3  15.0    16.0   5.0      22.0
2020-05-18 12:00:00         4   3.0    23.0  10.0      29.0
2020-05-19 12:00:00         4  27.0     7.0   8.0       0.0
2020-05-20 12:00:00         4   4.0    18.0   6.0      27.0
2020-05-18 12:00:00         5  10.0    25.0   0.0      29.0
2020-05-19 12:00:00         5  26.0     9.0  24.0       2.0
2020-05-20 12:00:00         5   6.0    12.0   8.0       5.0
2020-05-18 12:00:00         6   9.0     1.0  17.0      17.0
2020-05-19 12:00:00         6  23.0     2.0  23.0      20.0
2020-05-20 12:00:00         6  10.0     0.0   6.0      10.0