Question

假设我有以下数据框

| domain           | category | confidence  

| www.test.com.    |          | 

| www.someurl.com  |          |

我想将 my_func 应用于域列。此函数返回具有两个值的元组，我想用每一行的这些值填充类别和置信度。类似于 df['category', 'confidence'] = df['domain'].apply(my_func)

我期待的结果是

| domain           | category       | confidence  

| www.test.com.    | test-category  |   0.5

| www.someurl.com  |  some-category |   0.7

Answer 1

如果您使用当前的 Pandas 版本，您可以使用 result_type='expand' 来实现。来自熊猫apply documentation：

>>>df.apply(lambda x: [1, 2], axis=1, result_type='expand')
   0  1
0  1  2
1  1  2
2  1  2

@Andrej Kesely 的解决方案也在那里说明：

在函数内部返回一个 Series 类似于传递 result_type='expand'。结果列名将是系列索引。

df.apply(lambda x: pd.Series([1, 2], index=['foo', 'bar']), axis=1)
   foo  bar
0    1    2
1    1    2
2    1    2

Answer 2

您可以返回 pd.Series。例如：

cnt = 0


def my_func(x):
    global cnt
    cnt += 10
    return pd.Series(["something {}".format(x), cnt])


df[["category", "confidence"]] = df["domain"].apply(my_func)
print(df)

打印：

            domain                   category  confidence
0    www.test.com.    something www.test.com.          10
1  www.someurl.com  something www.someurl.com          20

Pandas 将函数应用于列并从函数返回填充 2 列

2 个答案: