我有一些代码可以将两个数据帧连接在一起,并用df
中的值覆盖df1
中的值
import pandas as pd
import numpy as np
#create two dataframes
df = pd.DataFrame(np.random.randint(0,30,size=(10, 4)), columns=(['Temp', 'Precip', 'Wind', 'Pressure']))
df1 = pd.DataFrame(np.random.randint(0,30,size=(11, 4)), columns=(['Temp', 'Precip', 'Wind', 'Pressure']))
df['Location'] =[2,2,3,3,4,4,5,5,6,6]
df1['Location'] =[0,1,2,2,3,3,4,4,5,5,6,6]
#create two different indices which overlap
df.index = ["2020-05-18 12:00:00","2020-05-19 12:00:00","2020-05-18 12:00:00","2020-05-19 12:00:00","2020-05-18 12:00:00","2020-05-19 12:00:00","2020-05-18 12:00:00","2020-05-19 12:00:00","2020-05-18 12:00:00","2020-05-19 12:00:00"]
df1.index = ["2020-05-20 12:00:00","2020-05-20 12:00:00", "2020-05-20 12:00:00", "2020-05-19 12:00:00", "2020-05-20 12:00:00", "2020-05-19 12:00:00", "2020-05-20 12:00:00", "2020-05-19 12:00:00", "2020-05-20 12:00:00", "2020-05-19 12:00:00", "2020-05-20 12:00:00", "2020-05-19 12:00:00"]
#make the datetime the index
df.index = pd.to_datetime(df.index)
df1.index = pd.to_datetime(df1.index)
数据帧如下:
df
df1
df具有位置2、3、4、5和6的值,而df1具有位置0、1、2、3、4、5、6的值。当我使用cumcount和Combine_first将它们组合在一起时,它会删除位置0和1的值,因为它们的长度比其他位置短。
#merge and overwrite
df = df.set_index(df.groupby(level=0).cumcount(), append=True)
df1 = df1.set_index(df1.groupby(level=0).cumcount(), append=True)
df = df.combine_first(df1).sort_index(level=[1,0]).reset_index(level=1, drop=True)
df
这通过消除每个位置的日期顺序,并将位置0和1单独放在数据框中来与索引混淆。
我想要一个数据帧,其位置0和1分别具有第18、19和20号,并且具有这些值的NA值,如下所示:
非常感谢您的帮助。谢谢。
答案 0 :(得分:0)
一个想法应该是仅将Location
附加到MuliIndex
,然后使用unstack
方法通过stack
添加丢失的行:
df = df.set_index(['Location'], append=True)
df1 = df1.set_index(['Location'], append=True)
df = (df.combine_first(df1)
.unstack()
.stack(dropna=False)
.sort_index(level=[1,0])
.reset_index(level=1))
print (df)
Location Temp Precip Wind Pressure
2020-05-18 12:00:00 0 NaN NaN NaN NaN
2020-05-19 12:00:00 0 NaN NaN NaN NaN
2020-05-20 12:00:00 0 18.0 10.0 2.0 29.0
2020-05-18 12:00:00 1 NaN NaN NaN NaN
2020-05-19 12:00:00 1 NaN NaN NaN NaN
2020-05-20 12:00:00 1 13.0 11.0 27.0 16.0
2020-05-18 12:00:00 2 17.0 13.0 6.0 23.0
2020-05-19 12:00:00 2 12.0 8.0 18.0 21.0
2020-05-20 12:00:00 2 5.0 29.0 15.0 22.0
2020-05-18 12:00:00 3 23.0 2.0 12.0 5.0
2020-05-19 12:00:00 3 1.0 2.0 18.0 20.0
2020-05-20 12:00:00 3 15.0 16.0 5.0 22.0
2020-05-18 12:00:00 4 3.0 23.0 10.0 29.0
2020-05-19 12:00:00 4 27.0 7.0 8.0 0.0
2020-05-20 12:00:00 4 4.0 18.0 6.0 27.0
2020-05-18 12:00:00 5 10.0 25.0 0.0 29.0
2020-05-19 12:00:00 5 26.0 9.0 24.0 2.0
2020-05-20 12:00:00 5 6.0 12.0 8.0 5.0
2020-05-18 12:00:00 6 9.0 1.0 17.0 17.0
2020-05-19 12:00:00 6 23.0 2.0 23.0 20.0
2020-05-20 12:00:00 6 10.0 0.0 6.0 10.0