如何合并第二个具有不同列名和长度的两个数据帧?

时间:2021-04-15 10:57:48

标签: python pandas dataframe merge

我有两个数据框。第一个只是每日日期时间的列,而第二个同时包含日期和数据。这是一个例子:

df1 = pd.DataFrame({'Date': pd.date_range(start="2002-01-01",end="2021-04-14")})

df2 = pd.DataFrame.from_dict({'Date1': {0: '2002-01-01 00:00:00',
  1: '2002-01-02 00:00:00',
  2: '2002-01-03 00:00:00',
  3: '2002-01-04 00:00:00',
  4: '2002-01-07 00:00:00',
  5: '2002-01-08 00:00:00',
  6: '2002-01-09 00:00:00',
  7: '2002-01-10 00:00:00',
  8: '2002-01-11 00:00:00',
  9: '2002-01-14 00:00:00',
  10: '2002-01-15 00:00:00',
  11: '2002-01-16 00:00:00',
  12: '2002-01-17 00:00:00',
  13: '2002-01-18 00:00:00',
  14: '2002-01-21 00:00:00'},
 'GSEAFCI': {0: 98.85,
  1: 98.96,
  2: 98.88,
  3: 98.85,
  4: 98.83,
  5: 98.86,
  6: 98.87,
  7: 98.9,
  8: 98.87,
  9: 98.91,
  10: 98.84,
  11: 98.87,
  12: 98.84,
  13: 98.84,
  14: 98.87},
 'Date2': {0: '2004-03-03 00:00:00',
  1: '2004-03-04 00:00:00',
  2: '2004-03-05 00:00:00',
  3: '2004-03-08 00:00:00',
  4: '2004-03-09 00:00:00',
  5: '2004-03-10 00:00:00',
  6: '2004-03-11 00:00:00',
  7: '2004-03-12 00:00:00',
  8: '2004-03-15 00:00:00',
  9: '2004-03-16 00:00:00',
  10: '2004-03-17 00:00:00',
  11: '2004-03-18 00:00:00',
  12: '2004-03-19 00:00:00',
  13: '2004-03-22 00:00:00',
  14: '2004-03-23 00:00:00'},
 '2Y2YINF': {0: 3.9,
  1: 3.832,
  2: 3.668,
  3: 3.63,
  4: 3.649,
  5: 3.636,
  6: 3.63,
  7: 3.633,
  8: 3.628,
  9: 3.665,
  10: 3.621,
  11: 3.62,
  12: 3.616,
  13: 3.588,
  14: 3.5949999999999998},
 'Date3': {0: '2002-01-02 00:00:00',
  1: '2002-01-03 00:00:00',
  2: '2002-01-04 00:00:00',
  3: '2002-01-07 00:00:00',
  4: '2002-01-08 00:00:00',
  5: '2002-01-09 00:00:00',
  6: '2002-01-10 00:00:00',
  7: '2002-01-11 00:00:00',
  8: '2002-01-14 00:00:00',
  9: '2002-01-15 00:00:00',
  10: '2002-01-16 00:00:00',
  11: '2002-01-17 00:00:00',
  12: '2002-01-18 00:00:00',
  13: '2002-01-21 00:00:00',
  14: '2002-01-22 00:00:00'},
 '5Y5YINF': {0: 5.9274000000000004,
  1: 5.8767,
  2: 5.8977,
  3: 5.763,
  4: 5.8056,
  5: 5.825,
  6: 5.7287,
  7: 5.6922,
  8: 5.6222,
  9: 5.6145,
  10: 5.6655999999999995,
  11: 5.76,
  12: 5.7455,
  13: 5.718,
  14: 5.7589}})

我想要做的是合并 df1df2 以获取数据框 (dataset),其中:当数据存在时,它采用日期位置;当它不存在时,它只会得到 NaN。这就是我所做的:

dataset = df1.merge(df2, how = 'left', left_on= ['Date', 'Date','Date'],
                  right_on=['Date1','Date2','Date3'])

不幸的是,我到处都只得到 NaN 或 NaT。

相反,我想得到:

          Date   GSEAFCI          2Y2YINF         5Y5YINF
0   2002-01-01 00:00:00    98.85   NaN               5.9274
1   2002-01-02 00:00:00    98.96   NaN               5.8767
2   2002-01-03 00:00:00    98.88   NaN               5.8977
3   2002-01-04 00:00:00    98.85   NaN               5.7630
4    2002-01-05 00:00:00    NaN    NaN                 NaN
5    2002-01-06 00:00:00    NaN    NaN                 NaN
6   2002-01-07 00:00:00    98.83   NaN               5.8056
7   2002-01-08 00:00:00    98.86   NaN               5.8250
8   2002-01-09 00:00:00    98.87   NaN               5.7287
...
...


谁能帮我解决这个问题?

谢谢!

1 个答案:

答案 0 :(得分:1)

如果有相同的模式 - 数据列的日期时间可以使用 concat 的列表理解,然后将值转换为索引中的日期时间,并通过第一个数据帧 DataFrame.merge(按索引名称 Date和列名 Date):

a = df2.columns[::2]
b = df2.columns[1::2]
dfs = [df2.loc[:, x].set_index(x[0], drop=False)[x[1]] for x in zip(a, b)]
df = pd.concat(dfs, axis=1, sort=False).rename_axis('Date')
df.index = pd.to_datetime(df.index)

df = df1.merge(df, on='Date', how='left')
print (df.head(10))
        Date  GSEAFCI  2Y2YINF  5Y5YINF
0 2002-01-01    98.85      NaN      NaN
1 2002-01-02    98.96      NaN   5.9274
2 2002-01-03    98.88      NaN   5.8767
3 2002-01-04    98.85      NaN   5.8977
4 2002-01-05      NaN      NaN      NaN
5 2002-01-06      NaN      NaN      NaN
6 2002-01-07    98.83      NaN   5.7630
7 2002-01-08    98.86      NaN   5.8056
8 2002-01-09    98.87      NaN   5.8250
9 2002-01-10    98.90      NaN   5.7287
相关问题