Question

我正在尝试将两个数据帧合并到一个新的数据框中，其中两列将合并为一个列表。例如：这里是df1

QQuickItem

DF2

contentItem

结果数据框将是

   tkt_ticket_opened  tkt_adjtimetorepair  result_data_cohort_id
0      2017-01-09 05             0.075883                      1
1      2017-01-09 06             0.286550                      1
2      2017-01-09 07             0.124234                      1
3      2017-01-09 08             0.144504                      1
4      2017-01-09 09             0.416698                      1
5      2017-01-09 10             0.103199                      1
6      2017-01-09 11             0.063608                      1
7      2017-01-09 12             0.378695                      1
8      2017-01-09 13             0.686515                      1
9      2017-01-09 14             0.671016                      1
10     2017-01-09 15             0.406588                      1
11     2017-01-09 16             0.957627                      1
12     2017-01-09 17             0.504509                      1
13     2017-01-09 18             0.416487                      1
14     2017-01-09 19             0.412306                      1
15     2017-01-09 20             0.929061                      1
16     2017-01-09 21             0.421006                      1
17     2017-01-09 22             0.365754                      1
18     2017-01-09 23             0.557050                      1

对此的任何帮助将不胜感激。

Answer 1

首先，合并数据集：

merged = pd.merge(df1, df2, on= 'tkt_ticket_opened')

接下来，我们将获取包含两个tkt_adjtimetorepair列值的数组并将其转换为列表：

merged['tkt_adjtimetorepair'] = merged[['tkt_adjtimetorepair_x', 'tkt_adjtimetorepair_y']].values.tolist()

# cleanup
merged.drop(['tkt_adjtimetorepair_x', 'tkt_adjtimetorepair_y'], axis=1, inplace=True)

我们可以将此输出直接分配给列。

Answer 2

选项1：

df_a = pd.DataFrame([[1, 3], [2, 3], [3, 3]], columns=["tkt_ticket_opened", "tkt_adjtimetorepair"])
df_b = pd.DataFrame([[1, 4], [2, 4], [3, 4]], columns=["tkt_ticket_opened", "tkt_adjtimetorepair"])

组合数据的一种方法是根据您想要的结果构建系列。使用DataFrames的简化版本，您可以将列压缩在一起以产生所需的结果：

df_c = pd.DataFrame(OrderedDict(tkt_ticket_opened=df_a["tkt_ticket_opened"], 
                tkt_adjtimetorepair=pd.Series(zip(df_a["tkt_adjtimetorepair"], 
                                                  df_b["tkt_adjtimetorepair"])).map(list)))

df_c.head()


     tkt_ticket_opened   tkt_adjtimetorepair

0         1                  [3, 4]

1         2                  [3, 4]

2         3                  [3, 4]

选项2：

通过在所需的键上合并DataFrames，然后将两列发送到列表，也可以实现相同的结果：

df_c = pd.merge(df_a, df_b, on="tkt_ticket_opened")
df_c["tkt_adjtimetorepair"] = df_c[["tkt_adjtimetorepair_x", "tkt_adjtimetorepair_y"]].values.tolist()
df_c = df_c[["tkt_ticket_opened", "tkt_adjtimetorepair"]]

df_c.head()

     tkt_ticket_opened   tkt_adjtimetorepair

0         1                  [3, 4]

1         2                  [3, 4]

2         3                  [3, 4]

我更喜欢选项2 ，因为它更有效，也是更好的熊猫解决方案。

将pandas数据框中的两列合并到一个列表中

2 个答案: