将具有相同列的两个 Pandas Dataframe 合并为一个字符串列

时间:2021-07-20 19:56:29

标签: python pandas dataframe merge

我有两个 Pandas 数据框,即:

+-------+-------------------+--+
| Name  |       Class       |  |
+-------+-------------------+--+
| Alice | Physics           |  |
| Bob   | "" (Empty string) |  |
+-------+-------------------+--+

表 2:

+-------+-----------+
| Name  |   Class   |
+-------+-----------+
| Alice | Chemistry |
| Bob   | Math      |
+-------+-----------+

有没有办法在列 Class 上轻松组合它,因此结果表如下:

+-------+--------------------+
| Name  |       Class        |
+-------+--------------------+
| Alice | Physics, Chemistry |
| Bob   | Math               |
+-------+--------------------+

我还想确保添加列时没有多余的逗号。谢谢!

2 个答案:

答案 0 :(得分:2)

df = pd.DataFrame({'Name':['Alice','Bob'],
                   'Class':['Physics',np.nan]})
df2 = pd.DataFrame({'Name':['Alice','Bob'],
                   'Class':['Chemistry','Math']})

df3 = df.append(df2).dropna(subset=['Class']).groupby('Name')['Class'].apply(list).reset_index()

# to remove list
df3['Class'] = df3['Class'].apply(lambda x: ', '.join(x))

答案 1 :(得分:1)

尝试使用 concatgroupby

>>> pd.concat([df1, df2]).groupby("Name").agg(lambda x: ", ".join(i for i in x.tolist() if len(i.strip())>0)).reset_index()
                    
Name                Class     
Alice  Physics, Chemistry
Bob                  Math