连接具有相同列的两个panda数据框并合并具有相同索引的行

时间:2020-08-07 12:19:51

标签: python pandas dataframe

我有两个数据框 df1 df2 ,每个数据框都具有相同的列名称,并使用时间戳作为标记。我想合并两个数据框,同时合并具有相同索引的行,选择存储在 df2 中的值作为首选项。这句话措辞不佳,但请参见下文。 例如

# -*- mode: python ; coding: utf-8 -*-

block_cipher = None


a = Analysis(['startup.py'],
             pathex=['/home/kenneth/PycharmProjects/universal_predictor'],
             binaries=[],
             datas=[],
             hiddenimports=['models', 'stapp'],
             hookspath=['.'],
             runtime_hooks=[],
             excludes=['torch.distributions'],
             win_no_prefer_redirects=False,
             win_private_assemblies=False,
             cipher=block_cipher,
             noarchive=False)
pyz = PYZ(a.pure, a.zipped_data,
             cipher=block_cipher)
exe = EXE(pyz,
          a.scripts,
          a.binaries,
          a.zipfiles,
          a.datas,
          [],
          name='startup',
          debug=False,
          bootloader_ignore_signals=False,
          strip=False,
          upx=True,
          upx_exclude=[],
          runtime_tmpdir=None,
          console=False , icon='unipredictor-icon.ico')

df3 是我想要实现的目标。这是 df1 df2 中每个索引的时间戳。对于db2不是NaN的每个公共索引,我们将使用这些值,否则将保留存储在 df1 中的那些值。

>>> df1= TimeStamp A_Output B_Output C_Output
          00:00:00  20       15       5
          00:00:06  20       NaN      3
          00:00:15  15       6      NaN
          00:00:20  20       NaN      5
          00:00:30  25       14      10


 >>> df2= TimeStamp A_Output B_Output C_Output
          00:00:00  15       5        8
          00:00:04  16       NaN      NaN
          00:00:06  17       NaN      NaN
          00:00:15  NaN      NaN      2
          00:00:18  19       NaN      NaN
          00:00:21  14       NaN      NaN
          00:00:26  32       NaN      5
          

 >>> df3= TimeStamp A_Output B_Output C_Output
          00:00:00  15       5        8
          00:00:04  16       NaN      NaN
          00:00:06  17       NaN      3
          00:00:15  15       6        2
          00:00:18  19       NaN      NaN
          00:00:21  14       NaN      NaN
          00:00:26  32       NaN      5
          00:00:30  25       14      10

为清楚起见,请参见上面的示例。 我真的找不到办法-作为参考,每个数据框大约有90列和100k +行。

1 个答案:

答案 0 :(得分:2)

先尝试结合:

df3 = df2.combine_first(df1)

print(df3)

           A_Output  B_Output  C_Output
TimeStamp                              
00:00:00       15.0       5.0       8.0
00:00:04       16.0       NaN       NaN
00:00:06       17.0       NaN       3.0
00:00:15       15.0       6.0       2.0
00:00:18       19.0       NaN       NaN
00:00:20       20.0       NaN       5.0
00:00:21       14.0       NaN       NaN
00:00:26       32.0       NaN       5.0
00:00:30       25.0      14.0      10.0