如何在python中水平连接pandas数据帧

时间:2017-09-29 16:35:39

标签: python python-3.x pandas dataframe pretty-print

我尝试了几种不同的方法来从Python数据分析库(PANDAS)中水平连接DataFrame对象,但到目前为止我的尝试都失败了。

给定输入的期望输出:

我有两个数据帧:
D_1:

      col2    col3
col1                
str1     1  1.5728
str2     2  2.4627
str3     3  3.6143

D_2:

      col2    col3
col1              
str1     4  4.5345
str2     5  5.1230
str3     6  6.1233

我希望最终得到的数据帧是d_1和d_2并排:

      col2    col3    col1  col2   col3
col1                                  
str1     1  1.5728    str1     4  4.5345
str2     2  2.4627    str2     5  5.1230
str3     3  3.6143    str3     6  6.1233

创建测试输入:

以下是一些创建数据帧的代码:

import pandas as pd

column_headers = ["col1", "col2", "col3"]
d_1 = dict.fromkeys(column_headers)
d_1["col1"] = ["str1", "str2", "str3"]
d_1["col2"] = [1, 2, 3]
d_1["col3"] = [1.5728, 2.4627, 3.6143]
df_1 = pd.DataFrame(d_1)
df_1 = df_1.set_index("col1")
print("df_1:")
print(df_1)
print()


d_2 = dict.fromkeys(column_headers)
d_2["col1"] = ["str1", "str2", "str3"]
d_2["col2"] = [4, 5, 6]
d_2["col3"] = [4.5345, 5.123, 6.1233]
df_2 = pd.DataFrame(d_2)
df_2 = df_2.set_index("col1")
print("df_2:")
print(df_2)
print()

尝试失败:

解决方案1失败

外部联接无法水平连接d_1和d_2:

merged_df = df_1.join(df_2, how='outer')

我们收到以下错误消息:

ValueError: columns overlap but no suffix specified: Index(['col2', 'col3'], dtype='object')

解决方案2失败:

制作字典词典不起作用:

# Make a dictionary of dictionaries
merged_d = dict()
merged_d[1] = d_1
merged_d[2] = d_2
merged_df = pd.DataFrame(merged_d)
print(merged_df)

生成的DataFrame如下所示:

                             1                        2
col1        [str1, str2, str3]       [str1, str2, str3]
col2                 [1, 2, 3]                [4, 5, 6]
col3  [1.5728, 2.4627, 3.6143]  [4.5345, 5.123, 6.1233]

解决方案3失败:

Subattempt 3a:

制作DataFrames字典似乎也不起作用:

merged_d = dict()
merged_d[1] = df_1
merged_d[2] = df_2
merged_df = pd.DataFrame(merged_d)
print(merged_df)

我们收到以下错误消息:

ValueError: If using all scalar values, you must pass an index

Subattempt 3b:

将索引传递给DataFrame构造函数没有多大帮助:

merged_df = pd.DataFrame(data = merged_d, index = [1, 2])

我们收到错误:

Value Error: cannot copy sequence with size 2 to array axis with dimension 3

1 个答案:

答案 0 :(得分:6)

使用concat与轴1而不是合并

ndf = pd.concat([df_1, df_2], axis=1)

     col2    col3  col2    col3
col1                            
str1     1  1.5728     4  4.5345
str2     2  2.4627     5  5.1230
str3     3  3.6143     6  6.1233