如何在python中合并两个不规则的数据帧

时间:2018-06-04 10:35:20

标签: python python-3.x pandas dataframe merge

所以我有两个数据框

在第一个数据框中,它显示了在该系列的不同日期销售的4辆汽车和销售的汽车数量

在第二个数据框中,有些汽车有维修,因此称为

DF1:

     Car     Repair_Calls
0    A            2 
1    C            45
2    D            32
4    E            1

DF2:

     Car     Range(Days)    Sold     Repair_Calls
  0   A          1-3            5           2
  1              4-7            23          45
  2              8-15           2           32
  3   B          4-7            4           1
  4              8-15           1
  5   C          1-3            5           
  6   D          1-3            2           
  7   E          1-3            9            

我试过

DF1 [' Repair_Calls'] = DF2 [' Repair_Calls']

我得到了什么

     Car     Range(Days)    Sold     Repair_Calls
  0   A          1-3            5           2
  1              4-7            23
  2              8-15           2
  3   B          4-7            4           0
  4              8-15           1
  5   C          1-3            5           45
  6   D          1-3            2           32
  7   E          1-3            9            1

预期输出

{{1}}

2 个答案:

答案 0 :(得分:2)

使用map创建的Seriesdf2一起使用set_index

df1['Repair_Calls'] = df1['Cars'].map(df2.set_index('Car')['Repair_Calls'])

mergeleft加入:

df1 = df1.merge(df2, left_on='Cars',right_on='Car', how='left').drop('Car', axis=1)

print (df1)
  Cars Range(Days)  Sold  Repair_Calls
0    A         1-3     5           2.0
1  NaN         4-7    23           NaN
2  NaN        8-15     2           NaN
3    B         4-7     4           NaN
4  NaN        8-15     1           NaN
5    C         1-3     5          45.0
6    D         1-3     2          32.0
7    E         1-3     9           1.0

但如果还需要添加缺失值,请按唯一的非NaN值添加reindex

s = df2.set_index('Car')['Repair_Calls'].reindex(df1['Cars'].dropna().unique(), fill_value=0)
df1['Repair_Calls'] = df1['Cars'].map(s)
print (df1)
  Cars Range(Days)  Sold  Repair_Calls
0    A         1-3     5           2.0
1  NaN         4-7    23           NaN
2  NaN        8-15     2           NaN
3    B         4-7     4           0.0
4  NaN        8-15     1           NaN
5    C         1-3     5          45.0
6    D         1-3     2          32.0
7    E         1-3     9           1.0

答案 1 :(得分:1)

@san ,您可以尝试以下代码:

  

如果解决方案不能满足您的问题需要更多输入和解释,请发表评论。

»代码

import pandas as pd
import numpy as np 

data_arr1 = [
    ['A', '1-3', 5],
    ['',  '4-7', 23],
    ['',  '8-15', 2],
    ['B', '4-7', 4],
    ['', '8-15', 1],
    ['C', '1-3', 5],
    ['D', '1-3', 2],
    ['E', '1-3', 9]
]
columns1 = ["Car", "Range(Days)", "Sold"];

data_arr2 = [
    ['A', 2],
    ['C', 45],
    ['D', 32],
    ['E', 1]
];
columns2 = ["Car", "Repair_Calls"];

# Creating DataFrames
df = pd.DataFrame(data_arr1, columns=columns1)
df2 = pd.DataFrame(data_arr2, columns=columns2)

# Printing Dataframes
print(df);
print('\n')
print(df2);

# Merging df & df2 to get desired output
df3 = pd.merge(left=df, right=df2, left_on="Car", right_on="Car", how="outer").replace(np.nan, "", regex=True)
print('\n')
print(df3)

输出»

  Car Range(Days)  Sold
0   A         1-3     5
1             4-7    23
2            8-15     2
3   B         4-7     4
4            8-15     1
5   C         1-3     5
6   D         1-3     2
7   E         1-3     9


  Car  Repair_Calls
0   A             2
1   C            45
2   D            32
3   E             1


  Car Range(Days)  Sold Repair_Calls
0   A         1-3     5            2
1             4-7    23
2            8-15     2
3            8-15     1
4   B         4-7     4
5   C         1-3     5           45
6   D         1-3     2           32
7   E         1-3     9            1