Question

我有一个pandas DataFrame，我想要使用的一列值是列表。我希望将每个列表中的两个元素逐个组合，然后输出到另一个DataFrame中例如，我有数据框df，其中包含col_a和col_b。 col_b的值是列表。我想循环df.col_b的值，输出配对列表。

import pandas as pd

df=pd.DataFrame({'col_a':['ast1','ast2','ast3'],'col_b':[['text1','text2','text3'],['mext1','mext2','mext3'],['cext1','cext2']]})
df

    col_a   col_b
0   ast1    [text1, text2, text3]
1   ast2    [mext1, mext2, mext3]
2   ast3    [cext1, cext2]

我想要这个：

    col_a   col_b_1
0   ast1    [text1, text2]
1   ast1    [text1, text3]
2   ast1    [text2, text3]
3   ast2    [mext1, mext2]
4   ast2    [mext1, mext3]
5   ast2    [mext2, mext3]
6   ast3    [cext1, cext2]

Answer 1

假设您的col_a每行有唯一值，您可以使用combinations中的itertools生成列表元素的所有两种组合：

from itertools import combinations
(df.groupby('col_a')['col_b']
   .apply(lambda x: pd.Series(list(combinations(x.iloc[0], 2))))
   .reset_index(level = 0))

#  col_a            col_b
#0  ast1    (text1, text2)
#1  ast1    (text1, text3)
#2  ast1    (text2, text3)
#0  ast2    (mext1, mext2)
#1  ast2    (mext1, mext3)
#2  ast2    (mext2, mext3)
#0  ast3    (cext1, cext2)

Answer 2

您可以使用itertools展平列表：

import itertools
series = df["col_b"].apply(lambda x: \
    pd.Series(list(itertools.combinations(x,2)))).stack()

该系列必须具有可与＆＃34;母亲＆＃34;合并的名称。数据帧：

series.name = "col_b_1"

现在，合并两个数据对象并选择所需的列：

result = df.merge(pd.DataFrame(series).reset_index(),
    left_index=True,
    right_on="level_0")[["col_a","col_b_1"]]

结果是一列元组;如果不行，.apply()将list()函数发送给它。

#   col_a         col_b_1
# 0  ast1  (text1, text2)
# 1  ast1  (text1, text3)
# 2  ast1  (text2, text3)
# 3  ast2  (mext1, mext2)
# 4  ast2  (mext1, mext3)
# 5  ast2  (mext2, mext3)
# 6  ast3  (cext1, cext2)

pandas循环列的值

2 个答案: