我有两个看起来像这样的数据框:
2001-01-03 00:00:00 NaN NaN NaN NaN NaN
2001-01-03 00:01:00 0.95110 0.95110 0.95110 0.95110 4.0
2001-01-03 00:02:00 0.95100 0.95110 0.95100 0.95110 4.0
2001-01-03 00:03:00 0.95100 0.95100 0.95100 0.95100 4.0
2001-01-03 00:04:00 0.95090 0.95090 0.95090 0.95090 4.0
2001-01-03 00:05:00 0.95100 0.95100 0.95100 0.95100 4.0
我要做的是将一个df中的任何NaN行替换为另一个df中相同dateindex的行。
我试过这样的事情:
df = df.apply(lambda x: df2.ix[x['row']] if x.isnull().any() else x)
但它只是抛出了一堆错误,即使我可以让它工作也可能不是最优的方法。 据我所知,有可能用.update()来做,但是我无法理解它,所以如果有人能提供一些帮助我会非常感激。
答案 0 :(得分:1)
您可以使用DataFrame.combine
:
df = df1.combine_first(df2)
df = df1.fillna(df2)
df1.update(df2)
print (df1)
但DataFrames
中需要相同的列名。
样品:
df1 = pd.DataFrame({1: {pd.Timestamp('2001-01-03 00:01:00'): 0.95109999999999995, pd.Timestamp('2001-01-03 00:03:00'): 0.95099999999999996, pd.Timestamp('2001-01-03 00:02:00'): 0.95099999999999996, pd.Timestamp('2001-01-03 00:00:00'): np.nan, pd.Timestamp('2001-01-03 00:05:00'): 0.95099999999999996, pd.Timestamp('2001-01-03 00:04:00'): 0.95089999999999997}, 2: {pd.Timestamp('2001-01-03 00:01:00'): 0.95109999999999995, pd.Timestamp('2001-01-03 00:03:00'): 0.95099999999999996, pd.Timestamp('2001-01-03 00:02:00'): 0.95109999999999995, pd.Timestamp('2001-01-03 00:00:00'): np.nan, pd.Timestamp('2001-01-03 00:05:00'): 0.95099999999999996, pd.Timestamp('2001-01-03 00:04:00'): 0.95089999999999997}, 3: {pd.Timestamp('2001-01-03 00:01:00'): 0.95109999999999995, pd.Timestamp('2001-01-03 00:03:00'): 0.95099999999999996, pd.Timestamp('2001-01-03 00:02:00'): 0.95099999999999996, pd.Timestamp('2001-01-03 00:00:00'): np.nan, pd.Timestamp('2001-01-03 00:05:00'): 0.95099999999999996, pd.Timestamp('2001-01-03 00:04:00'): 0.95089999999999997}, 4: {pd.Timestamp('2001-01-03 00:01:00'): 0.95109999999999995, pd.Timestamp('2001-01-03 00:03:00'): 0.95099999999999996, pd.Timestamp('2001-01-03 00:02:00'): 0.95109999999999995, pd.Timestamp('2001-01-03 00:00:00'): np.nan, pd.Timestamp('2001-01-03 00:05:00'): 0.95099999999999996, pd.Timestamp('2001-01-03 00:04:00'): 0.95089999999999997}, 5: {pd.Timestamp('2001-01-03 00:01:00'): 4.0, pd.Timestamp('2001-01-03 00:03:00'): 4.0, pd.Timestamp('2001-01-03 00:02:00'): 4.0, pd.Timestamp('2001-01-03 00:00:00'): np.nan, pd.Timestamp('2001-01-03 00:05:00'): 4.0, pd.Timestamp('2001-01-03 00:04:00'): 4.0}})
df2 = pd.DataFrame({1: {pd.Timestamp('2001-01-03 00:01:00'): 0.95099999999999996, pd.Timestamp('2001-01-03 00:00:00'): 0.95089999999999997}, 2: {pd.Timestamp('2001-01-03 00:01:00'): 0.95099999999999996, pd.Timestamp('2001-01-03 00:00:00'): 0.95089999999999997}, 3: {pd.Timestamp('2001-01-03 00:01:00'): 0.95099999999999996, pd.Timestamp('2001-01-03 00:00:00'): 0.95089999999999997}, 4: {pd.Timestamp('2001-01-03 00:01:00'): 0.95099999999999996, pd.Timestamp('2001-01-03 00:00:00'): 0.95089999999999997}, 5: {pd.Timestamp('2001-01-03 00:01:00'): 4.0, pd.Timestamp('2001-01-03 00:00:00'): 4.0}})
print (df1)
1 2 3 4 5
2001-01-03 00:00:00 NaN NaN NaN NaN NaN
2001-01-03 00:01:00 0.9511 0.9511 0.9511 0.9511 4.0
2001-01-03 00:02:00 0.9510 0.9511 0.9510 0.9511 4.0
2001-01-03 00:03:00 0.9510 0.9510 0.9510 0.9510 4.0
2001-01-03 00:04:00 0.9509 0.9509 0.9509 0.9509 4.0
2001-01-03 00:05:00 0.9510 0.9510 0.9510 0.9510 4.0
print (df2)
1 2 3 4 5
2001-01-03 00:00:00 0.9509 0.9509 0.9509 0.9509 4.0
2001-01-03 00:01:00 0.9510 0.9510 0.9510 0.9510 4.0
df = df1.combine_first(df2)
print (df)
1 2 3 4 5
2001-01-03 00:00:00 0.9509 0.9509 0.9509 0.9509 4.0
2001-01-03 00:01:00 0.9511 0.9511 0.9511 0.9511 4.0
2001-01-03 00:02:00 0.9510 0.9511 0.9510 0.9511 4.0
2001-01-03 00:03:00 0.9510 0.9510 0.9510 0.9510 4.0
2001-01-03 00:04:00 0.9509 0.9509 0.9509 0.9509 4.0
2001-01-03 00:05:00 0.9510 0.9510 0.9510 0.9510 4.0
df = df1.fillna(df2)
print (df)
1 2 3 4 5
2001-01-03 00:00:00 0.9509 0.9509 0.9509 0.9509 4.0
2001-01-03 00:01:00 0.9511 0.9511 0.9511 0.9511 4.0
2001-01-03 00:02:00 0.9510 0.9511 0.9510 0.9511 4.0
2001-01-03 00:03:00 0.9510 0.9510 0.9510 0.9510 4.0
2001-01-03 00:04:00 0.9509 0.9509 0.9509 0.9509 4.0
2001-01-03 00:05:00 0.9510 0.9510 0.9510 0.9510 4.0
df1.update(df2)
print (df1)
1 2 3 4 5
2001-01-03 00:00:00 0.9509 0.9509 0.9509 0.9509 4.0
2001-01-03 00:01:00 0.9510 0.9510 0.9510 0.9510 4.0
2001-01-03 00:02:00 0.9510 0.9511 0.9510 0.9511 4.0
2001-01-03 00:03:00 0.9510 0.9510 0.9510 0.9510 4.0
2001-01-03 00:04:00 0.9509 0.9509 0.9509 0.9509 4.0
2001-01-03 00:05:00 0.9510 0.9510 0.9510 0.9510 4.0