Question

这可能是一个非常基本的问题（如果有异议，我可以将其删除）。

假设我有一个在各种项目中多次重复使用的函数：

def sort_clean(x, sort_cols):
   x.sort(sort_cols, inplace=True)
   x.reset_index(inplace=True, drop=True)

我想将此作为pandas模块的一部分，以便每当我import pandas并定义数据框myDf时，我都可以将mfDf.sort_clean作为可用函数数据帧。这可能吗？

Answer 1

您可以继承DataFrame

class NewDataFrame(pandas.DataFrame):
    def sort_clean(self, sort_cols):
        self.sort(sort_cols, inplace=True)
        self.reset_index(inplace=True, drop=True)

例如：

In [25]: class NewDataFrame(pandas.DataFrame):
   ....:     def sort_clean(self, sort_cols):
   ....:         self.sort(sort_cols, inplace=True)
   ....:         self.reset_index(inplace=True, drop=True)
   ....:         

In [26]: dfrm
Out[26]: 
          A         B         C
0  0.382531  0.287066  0.345749
1  0.725201  0.450656  0.336720
2  0.146883  0.266518  0.011339
3  0.111154  0.190367  0.275750
4  0.757144  0.283361  0.736129
5  0.039405  0.643290  0.383777
6  0.632230  0.434664  0.094089
7  0.658512  0.368150  0.433340
8  0.062180  0.523572  0.505400
9  0.287539  0.899436  0.194938

[10 rows x 3 columns]

In [27]: my_df = NewDataFrame(dfrm) 

In [28]: my_df.sort_clean(["B", "C"])

In [29]: my_df
Out[29]: 
          A         B         C
0  0.111154  0.190367  0.275750
1  0.146883  0.266518  0.011339
2  0.757144  0.283361  0.736129
3  0.382531  0.287066  0.345749
4  0.658512  0.368150  0.433340
5  0.632230  0.434664  0.094089
6  0.725201  0.450656  0.336720
7  0.062180  0.523572  0.505400
8  0.039405  0.643290  0.383777
9  0.287539  0.899436  0.194938

[10 rows x 3 columns]

但请注意，使用任何返回新DataFrame个对象的函数都不会自动返回NewDataFrame。

正常的猴子修补（例如只是在现有的DataFrame实例上创建一个新属性，如df.sort_clean = sort_clean）将是棘手的，因为该方法需要提供的实例值作为隐式的第一个参数，特别是因为你做就地突变。为此，您必须经常使用functools.partial或默认的lambda来实现相同的目标：

df.sort_clean = lambda sort_cols, x=df: sort_clean(x, sort_cols)

请注意，使用lambda方法，您需要指定具有默认值的参数（具有默认值的参数必须遵循Python中没有默认值的参数）。如果您选择使用functools.partial，则可以解决此问题。

import functools
df.sort_clean = functools.partial(sort_clean, df)

将功能附加到熊猫

1 个答案: