Question

我有以下场景。

import pandas as pd

d = {'col1': [1, 2, 3], 
     'col2': [['apple'], [], ['romaine', 'potatoes']], 
     'col3': [['orange', 'apple'], ['apple'], ['potatoes', 'collard']]
    }
df = pd.DataFrame(data=d)

所以数据框是：

   col1   col2                   col3
0   1     [apple]                ['orange', 'apple']
1   2     []                     ['apple']
2   3     [romaine, potatoes]    ['potatoes', 'collard']

我想创建第三列作为来自 col2 和 col3 列的唯一值的组合列表。我想在一行中完成。

我写了一个两行的解决方案：

df['col4'] = df['col2'] + df['col3']
df.col4 = df.col4.apply(list(set(lambda x: list(set(x)))

导致预期结果：

    col1    col2                col3                col4
0   1       [apple]             [orange, apple]     [orange, apple]
1   2       []                  [apple]             [apple]
2   3       [romaine, potatoes] [potatoes, collard] [collard, romaine, potatoes]

不知道有没有办法写一行代码比如：

df['col4'] = df.col2.apply(lambda x: list(set(x + df.col3)))

以上代码导致以下错误：

<块引用>

TypeError: 无法使用类型的操作数广播 np.ndarray

Answer 1

在整个数据框（不仅仅是一列）上尝试使用 apply 和 axis=1：

df['col4'] = df.apply(lambda x: list(set(x['col2'] + x['col3'])), axis=1)

输出：

   col1                 col2                 col3                          col4
0     1              [apple]      [orange, apple]               [apple, orange]
1     2                   []              [apple]                       [apple]
2     3  [romaine, potatoes]  [potatoes, collard]  [potatoes, collard, romaine]

对两个 Pandas 列执行逐行操作

1 个答案: