Question

我有两个索引不均匀的数据框：

我想将这两个数据框df1和df2合并到df3，但我无法使用以下代码执行此操作：

df3 = pd.concat(df1,df2,axis=1)

请帮助：如何连接？

我想到达这个数据框：

Answer 1

您需要在Dataframe中使用相同的索引值进行对齐，因此对drop=True使用reset_index作为默认的唯一索引：

new = pd.concat([df1.reset_index(drop=True),df2.reset_index(drop=True)],axis=1)

另一种解决方案，如果索引长度相同：

df2.index = df1.index
new = pd.concat([df1,df2],axis=1)

<强>示例：

df1 = pd.DataFrame({x: pd.Series(range(4)).astype(str).radd(x) for x in list('ABCD')})
print (df1)
    A   B   C   D
0  A0  B0  C0  D0
1  A1  B1  C1  D1
2  A2  B2  C2  D2
3  A3  B3  C3  D3

df2 = pd.DataFrame({'E':['E0','E1','E2','E3']}, index=[4,5,6,7])
print (df2)
    E
4  E0
5  E1
6  E2
7  E3

new = pd.concat([df1.reset_index(drop=True),df2.reset_index(drop=True)],axis=1) 
print (new)
    A   B   C   D   E
0  A0  B0  C0  D0  E0
1  A1  B1  C1  D1  E1
2  A2  B2  C2  D2  E2
3  A3  B3  C3  D3  E3

此方法更为通用，因此如果需要可能具有相同列名称的新列，请添加参数keys，然后在列中展平MultiIndex：

df2 = pd.DataFrame({'A':['E0','E1','E2','E3']}, index=[4,5,6,7])
print (df2)
    A
4  E0
5  E1
6  E2
7  E3

new=pd.concat([df1.reset_index(drop=True),df2.reset_index(drop=True)],axis=1, keys=('a','b')) 
new.columns = new.columns.map('_'.join)
print (new)
  a_A a_B a_C a_D b_A
0  A0  B0  C0  D0  E0
1  A1  B1  C1  D1  E1
2  A2  B2  C2  D2  E2
3  A3  B3  C3  D3  E3

Answer 2

`pd.DataFrame.join` + `pd.DataFrame.set_index`

df1.join(df2.set_index(df1.index))

    A   B   C   D   E
0  A0  B0  C0  D0  E0
1  A1  B1  C1  D1  E1
2  A2  B2  C2  D2  E2
3  A3  B3  C3  D3  E3

`pd.DataFrame.assign`

df1.assign(**df2.to_dict('l'))

    A   B   C   D   E
0  A0  B0  C0  D0  E0
1  A1  B1  C1  D1  E1
2  A2  B2  C2  D2  E2
3  A3  B3  C3  D3  E3

工作原理

assing接受关键字参数，其中键是新列的名称，值是分配给该列的值。好吧，我们可以使用double-splat **将字典解压缩为关键字参数。碰巧，我可以使用df2.to_dict从其他数据框中获取这样的字典。只是，我需要指定我希望该字典以'list'方向显示，我可以用'l'缩写。

<强>注意事项
虽然这个解决方案很聪明，但也存在一些局限性。正如jezrael在评论中提出的那样。在引入已经存在的列时，我受到限制。这些列将被覆盖。此外，如果您的列名是数字，则它将不起作用，因为这些列名称必须是字符串。

设置

df1 = pd.DataFrame([list('0123')], list('ABCD')).T.pipe(lambda d: d.radd(d.columns))
df2 = pd.DataFrame(dict(E='E0 E1 E2 E3'.split()), [4, 5, 6, 7])

print(df1, df2, sep='\n\n')

    A   B   C   D
0  A0  B0  C0  D0
1  A1  B1  C1  D1
2  A2  B2  C2  D2
3  A3  B3  C3  D3

    E
4  E0
5  E1
6  E2
7  E3

大熊猫中的列连接

2 个答案:

`pd.DataFrame.join` + `pd.DataFrame.set_index`

`pd.DataFrame.assign`

大熊猫中的列连接

2 个答案:

pd.DataFrame.join + pd.DataFrame.set_index

pd.DataFrame.assign

`pd.DataFrame.join` + `pd.DataFrame.set_index`

`pd.DataFrame.assign`