我的数据框df
如下所示:
one three two
0 1.0 10.0 4.0
1 2.0 3.0 3.0
2 3.0 22.0 2.0
3 4.0 1.0 1.0
我有另一个单行数据框df2
,如下所示:
a b m u
0 1.0 2.0 1.0 4.0
我希望将两者连接起来以结束:
one three two a b m u
0 1.0 10.0 4.0 1.0 2.0 1.0 4.0
1 2.0 3.0 3.0 1.0 2.0 1.0 4.0
2 3.0 22.0 2.0 1.0 2.0 1.0 4.0
3 4.0 1.0 1.0 1.0 2.0 1.0 4.0
我试过了:
df3 = pd.concat([df, df2], axis=1, ignore_index=True)
0 1 2 3 4 5 6
0 1.0 10.0 4.0 1.0 2.0 1.0 4.0
1 2.0 3.0 3.0 NaN NaN NaN NaN
2 3.0 22.0 2.0 NaN NaN NaN NaN
3 4.0 1.0 1.0 NaN NaN NaN NaN
错误答案......
我该如何解决这个问题?
非常感谢。
答案 0 :(得分:3)
我认为你可以使用numpy.tile
来重复数据:
df2 = pd.DataFrame(np.tile(df2.values, len(df.index)).reshape(-1,len(df2.columns)),
columns=df2.columns)
print (df2)
a b m u
0 1.0 2.0 1.0 4.0
1 1.0 2.0 1.0 4.0
2 1.0 2.0 1.0 4.0
3 1.0 2.0 1.0 4.0
df3 = df.join(df2)
print (df3)
one three two a b m u
0 1.0 10.0 4.0 1.0 2.0 1.0 4.0
1 2.0 3.0 3.0 1.0 2.0 1.0 4.0
2 3.0 22.0 2.0 1.0 2.0 1.0 4.0
3 4.0 1.0 1.0 1.0 2.0 1.0 4.0
或改进了John Galt solution - 仅替换NaN
的{{1}}列:
df2
df3 = df.join(df2)
df3[df2.columns] = df3[df2.columns].ffill()
print (df3)
one three two a b m u
0 1.0 10.0 4.0 1.0 2.0 1.0 4.0
1 2.0 3.0 3.0 1.0 2.0 1.0 4.0
2 3.0 22.0 2.0 1.0 2.0 1.0 4.0
3 4.0 1.0 1.0 1.0 2.0 1.0 4.0
由Series
iloc
创建的另一个解决方案,但列名称必须是字符串:
df3 = df.assign(**df2.iloc[0])
print (df3)
one three two a b m u
0 1.0 10.0 4.0 1.0 2.0 1.0 4.0
1 2.0 3.0 3.0 1.0 2.0 1.0 4.0
2 3.0 22.0 2.0 1.0 2.0 1.0 4.0
3 4.0 1.0 1.0 1.0 2.0 1.0 4.0
<强>计时强>:
np.random.seed(44)
N = 1000000
df = pd.DataFrame(np.random.random((N,5)), columns=list('ABCDE'))
df2 = pd.DataFrame(np.random.random((1, 50)))
df2.columns = 'a' + df2.columns.astype(str)
In [369]: %timeit df.join(pd.DataFrame(np.tile(df2.values, len(df.index)).reshape(-1,len(df2.columns)), columns=df2.columns))
1 loop, best of 3: 897 ms per loop
In [370]: %timeit df.assign(**df2.iloc[0])
1 loop, best of 3: 467 ms per loop
In [371]: %timeit df.assign(key=1).merge(df2.assign(key=1), on='key').drop('key',axis=1)
1 loop, best of 3: 1.55 s per loop
In [372]: %%timeit
...: df3 = df.join(df2)
...: df3[df2.columns] = df3[df2.columns].ffill()
...:
1 loop, best of 3: 1.9 s per loop
答案 1 :(得分:1)
使用merge
分配虚拟密钥。
df.assign(key=1).merge(df2.assign(key=1), on='key').drop('key',axis=1)
输出:
one three two a b m u
0 1.0 10.0 4.0 1.0 2.0 1.0 4.0
1 2.0 3.0 3.0 1.0 2.0 1.0 4.0
2 3.0 22.0 2.0 1.0 2.0 1.0 4.0
3 4.0 1.0 1.0 1.0 2.0 1.0 4.0