Question

我无法弄清楚如何将简单函数应用于熊猫数据框中每列的每一行。

示例：

/string

我希望下面的代码能够返回＆＃39; test＆＃39;对于每一行。而是打印原始值。

如何将delLastThree函数应用于DF中的每一行？

Answer 1

使用pd.Series的单括号选择时，您正在创建df['colOne']。

在.apply(func, axis=1)上使用DataFrame，即使用[['colOne']]选择，或者不选择任何列。但是，如果您使用.apply(axis=1)，则结果为pd.Series，因此您需要将.str方法的函数修改为.string。

使用pd.Series选择['colOne']后，您可以使用.apply()或.map()。

def delLastThree_series(x):
    x = x.strip()
    x = x[:-3]
    return x

def delLastThree_df(x):
    x = x.str.strip()
    x = x.str[:-3]
    return x

arr = ['test123','test234','test453']
arrDF = pd.DataFrame(arr)

arrDF.columns = ['colOne']

现在使用

arrDF.apply(delLastThree_df, axis=1)
arrDF[['colOne']].apply(delLastThree_df, axis=1)

或

arrDF['colOne'].apply(delLastThree_series)
arrDF['colOne'].map(delLastThree_series, axis=1)

得到：

  colOne
0   test
1   test
2   test

你当然也可以：

arrDF['colOne'].str.strip().str[:-3]

Answer 2

对系列使用map()函数（单列）：

In [15]: arrDF['colOne'].map(delLastThree)
Out[15]:
0    test
1    test
2    test
Name: colOne, dtype: object

或者如果你想改变它：

In [16]: arrDF['colOne'] = arrDF['colOne'].map(delLastThree)

In [17]: arrDF
Out[17]:
  colOne
0   test
1   test
2   test

但正如@Stefan所说，这将更快，更有效，更多＆＃34; Pandonic＆＃34;：

arrDF['colOne'] = arrDF['colOne'].str.strip().str[:-3]

或者如果要删除所有尾随空格和数字：

arrDF['colOne'] = arrDF['colOne'].str.replace(r'[\s\d]+$', '')

试验：

In [21]: arrDF['colOne'].str.replace(r'[\s\d]+$', '')
Out[21]:
0    test
1    test
2    test
Name: colOne, dtype: object

Pandas应用语法

2 个答案: