Question

我有两个数据框，其中一列包含列表。我必须区分如下所示的列。

DF1：

A   B
111 [12,13,14,14,15,13]
222 [15,16,17,15,17,17,17]
333 [17,14,16,14,14,17,17,16]
444 [25,26,18,12,12,12,13,18]

DF2：

A   B
111 [12,14]
222 []
333 [17,16]
444 [25,18]

预期产出：

A   B
111 [13,15,13]
222 [15,16,17,15,17,17,17]
333 [14,14,14]
444 [26,12,12,12,13]

Answer 1

您可以利用Pandas的合并能力和Python高效的set数据结构。

首先，合并：

df3 = df1.merge(df2, on='A')

然后，将df2的项目放入集合中：

df3.B_y = df3.B_y.apply(set)

现在使用list comprehension对每行中不在集合中的df1项进行迭代：

df3['res'] = df3.apply(lambda r: [e for e in r.B_x if e not in r.B_Y], axis=1)

Answer 2

你可以试试这个，

 df1["B"]=[list(i for i in df1["B"][j] if i not in df2["B"][j]) for j in range(df1.shape[0])]

Answer 3

你应该记住，大熊猫不会将列表存储为＆＃34;实际列表＆＃34;但作为一个对象。您应该始终尝试使用原子值而不是集合来创建列，以充分利用pandas功能。话虽这么说，要进行所需的转换，您只需转换df2的列，即可在df1的相应列中设置和删除所有这些项目。

您需要确保转换＆＃34;对象列表＆＃34;到＆＃34;实际列表/设置＆＃34;在进行操作之前。

以下是此代码：

在df1上应用的方法

def fun(x):
    # Find the list corresponding to the column A of df1 in df2
    # Use indexing to make this step faster
    remove_set = set(df2[df2['A']==x['A']].iloc[0]['B'])
    actual_list = list(x['B'])
    new_list = []
    for i in actual_list:
            if i not in remove_set:
                    new_list.append(i)
    return new_list

将方法调用为

df1['B'] = df1.apply(fun, axis=1)

将输出生成为

     A                             B
0  111                  [13, 15, 13]
1  222  [15, 16, 17, 15, 17, 17, 17]
2  333                  [14, 14, 14]
3  444          [26, 12, 12, 12, 13]

注意：如果您可以在A列上使用索引，那么此代码的性能将大大提高。

Answer 4

只是在pipe

之后使用merge的演示

def f(t):
    return [i for i in t[0] if not i in t[1]]

df1.merge(df2, on='A').pipe(
    lambda d: d[['A']].assign(B=list(map(f, d.drop('A', 1).values)))
)

     A                             B
0  111                  [13, 15, 13]
1  222  [15, 16, 17, 15, 17, 17, 17]
2  333                  [14, 14, 14]
3  444          [26, 12, 12, 12, 13]

详细

# Heavy lifting for differencing
def f(t):
    return [i for i in t[0] if not i in t[1]]

# Merge the same as AmiTavory
# But then I use pipe and assign.  Dbl Brackets to keep single column
# dataframe and assign to create a new B column
# then I use the values from the merge after dropping the A column
df1.merge(df2, on='A').pipe(
    lambda d: d[['A']].assign(B=list(map(f, d.drop('A', 1).values)))
)

Answer 5

与上述所有相比，这对我的工作时间更短

df3 = df1.merge(df2, on='A')

def set_diff(movie, all_):
    if movie is not None:
        return [item for item in movie if item not in all_]
    else:
        all_

movie_list = []
for item, row in df3.iterrows():
    movie = row['df2.B']
    all_  = row['df1.B']

    movie_list.append(set_diff(movie, all_))

如何区分包含列表的数据框中的列？

5 个答案:

详细