Question

让我们说这是我的功能：

def function(x):
    return x.str.lower()

这是我的DataFrame（df）

   A         B     C       D 
0  1.67430   BAR  0.34380  FOO 
1  2.16323   FOO -2.04643  BAR
2  0.19911   BAR -0.45805  FOO
3  0.91864   BAR -0.00718  BAR
4  1.33683   FOO  0.53429  FOO
5  0.97684   BAR -0.77363  BAR

我想将该功能应用于列B和D。（将其应用于完整的DataFrame不是答案，因为它会在数字列中生成NaN值。）

这是我的基本想法：df.apply(function, axis=1)

但我无法理解如何选择不同的列来应用函数。我已尝试通过数字位置，名称等进行各种索引。

我花了很多时间阅读这篇文章。这不是任何这些的直接副本：

How to apply a function to two columns of Pandas dataframe

Pandas: How to use apply function to multiple columns

Pandas: apply different functions to different columns

Python Pandas: Using 'apply' to apply 1 function to multiple columns

Answer 1

只需从df中选择列，只需忽略我们按列操作的axis参数，而不是逐行操作，这将显着，因为这里有更多的行而不是列：

df[['B','D']].apply(function)

这将针对每个列运行您的func

In [186]:
df[['B','D']].apply(function)

Out[186]:
     B    D
0  bar  foo
1  foo  bar
2  bar  foo
3  bar  bar
4  foo  foo
5  bar  bar

您也可以过滤df以获取字符串dtype列：

In [189]:
df.select_dtypes(include=['object']).apply(function)

Out[189]:
     B    D
0  bar  foo
1  foo  bar
2  bar  foo
3  bar  bar
4  foo  foo
5  bar  bar

<强>计时

逐列与逐行：

In [194]:    
%timeit df.select_dtypes(include=['object']).apply(function, axis=1)
%timeit df.select_dtypes(include=['object']).apply(function)

100 loops, best of 3: 3.42 ms per loop
100 loops, best of 3: 2.37 ms per loop

然而，对于明显更大的dfs（行方式），第一种方法将更好地扩展

Answer 2

应用不在适当位置，它会返回一个新的数据帧，因此问题是您可以一次性返回完整的数据帧。
你可以做到，但它很难看（可能会稍微快点）：

df.apply(lambda x: x.str.lower() if x.name in ['B', 'D'] else x)

如果要对所有字符串列执行此操作，只需检查dtype。

Answer 3

清理语法以就地编辑原始列：

df[["A", "B"]] = df[["A","B"]].apply(lambda x: x.str.lower())

另外，要将新列添加到原始数据框：

df[["new_col1", "new_col2"]] = df[["A","B"]].apply(lambda x: x.str.lower())

Answer 4

就地编辑的逐列应用功能：

    <Appender type="Console" name="STDOUT">

按行显示应用功能以进行就地编辑：

    <Appender type="File" name="File" fileName="${filename}">

其他使用逐列和行方式应用函数的有用操作：

    <Console name="STDOUT">

熊猫：如何将函数应用于不同的列

4 个答案: