Question

我是用Python设计类的新手。假设我们有一个小的pandas数据帧df。我想为这个接受这个数据帧的类编写一些方法。在大多数方法中，我想使用仅2列和2行的子集。假设给定列号，可以确定行号。每种方法都将使用此子集。我最终为每个方法重写了子设置代码，我确信这是多余的。我该如何避免这种情况？

class Summary(object):
    def __init__(self,summary_df):
        self.summary = summary_df
        #look_up_table={dict created}

    def performance(self,col1,col2):
        self.col1 = col1
        self.col2 = col2
        self.row1 = look_up_table[self.col1]
        self.row2 = look_up_table[self.col2]

        self.subset = self.summary.loc[[self.row1,self.row2]][[self.col1,self.col2]]
        plt.plot(self.subset.iloc[0],self.subset.iloc[1],'--o')

        plt.xlabel(self.row1)
        plt.ylabel(self.row2)
        plt.show()

    def get_slope(self,col1,col2):
        self.col1 = col1
        self.col2 = col2
        self.row1 = look_up_table[self.col1]
        self.row2 = look_up_table[self.col2]

        self.subset = self.summary.loc[[self.row1,self.row2]][[self.col1,self.col2]]
        # code for calculating slope
        return slope

    def other_methods(self,col1,col2,col3)

如果可能的话，我想避免在实例化这个类时说明列号。此外，def other_methods无论如何都可能需要超过2列，因此我认为将数据仅限制为两列可能效率不高。有什么想法/建议吗？

Answer 1

此代码的公共部分是获取列标识符并将其转换为行加子集引用。由于您希望将这些计算值保留在对象上，请将它们设置在单个辅助函数中。

class Summary(object):

    def __init__(self,summary_df):
        self.summary = summary_df
        #look_up_table={dict created}

    def _update_for_columns(self, col1, col2):
        """Given col1 and col2, update self with new values for
        col1, col2, row1, row2 and subset"""
        self.col1 = col1
        self.col2 = col2
        self.row1 = look_up_table[self.col1]
        self.row2 = look_up_table[self.col2]
        self.subset = self.summary.loc[[self.row1,self.row2]][[self.col1,self.col2]]

    def performance(self,col1,col2):
        self._update_for_columns(col1, col2)
        plt.plot(self.subset.iloc[0], self.subset.iloc[1],'--o')
        plt.xlabel(self.row1)
        plt.ylabel(self.row2)
        plt.show()

    def get_slope(self,col1,col2):
        self._update_for_columns(col1, col2)
        # code for calculating slope
        return slope

    def other_methods(self,col1,col2,col3)

更好的是，由于多个方法想要使用相同的计算值，因此它们也不应该设置值。这导致不必要的重新计算。让呼叫者在打电话之前更新它们。

class Summary(object):

    def __init__(self,summary_df):
        self.summary = summary_df
        #look_up_table={dict created}

    def update_for_columns(self, col1, col2):
        """Update for new columns before calling performance, et al."""
        self.col1 = col1
        self.col2 = col2
        self.row1 = look_up_table[self.col1]
        self.row2 = look_up_table[self.col2]
        self.subset = self.summary.loc[[self.row1,self.row2]][[self.col1,self.col2]]


    def performance(self):
        plt.plot(self.subset.iloc[0], self.subset.iloc[1],'--o')
        plt.xlabel(self.row1)
        plt.ylabel(self.row2)
        plt.show()

    def get_slope(self):
        # code for calculating slope
        return slope

    def other_methods(self,col1,col2,col3)

更好的是，将使用列的方法放在他们自己的类中，这样你就不会冒险让代码的两个部分认为它们在同一个东西上运行。

class Summary(object):

    def __init__(self,summary_df):
        self.summary = summary_df
        #look_up_table={dict created}

    def get_summary_cols(self, col1, col2):
        return SummaryCols(self, col1, col2)

class SummaryCols(object):

    def __init__(self, summary, col1, col2):
        self.summary = summary # assuming you need stuff from summary...
        self.col1 = col1
        self.col2 = col2
        self.row1 = look_up_table[self.col1]
        self.row2 = look_up_table[self.col2]
        self.subset = self.summary.loc[[self.row1,self.row2]][[self.col1,self.col2]]

    def performance(self):
        plt.plot(self.subset.iloc[0], self.subset.iloc[1],'--o')
        plt.xlabel(self.row1)
        plt.ylabel(self.row2)
        plt.show()

    def get_slope(self):
        # code for calculating slope
        return slope

    def other_methods(self,col1,col2,col3)

Answer 2

由于我不熟悉基础数据，所以我不能100％确定你要完成的任务，所以如果没有帮助，请随意驳回我的答案。

您可以将代码的“子集化”部分放在__init__中，这样当您实例化类时，整个方法中常见的数据转换将在开头和一个位置完成。

例如：

class Summary(object):
    def __init__(self,summary_df, col1, col2):
        self.summary = summary_df
        #look_up_table={dict created}

        self.col1 = col1
        self.col2 = col2
        self.row1 = look_up_table[self.col1]
        self.row2 = look_up_table[self.col2]

        self.subset = self.summary.loc[[self.row1,self.row2]][[self.col1,self.col2]]

    def performance(self):
        plt.plot(self.subset.iloc[0],self.subset.iloc[1],'--o')

        plt.xlabel(self.row1)
        plt.ylabel(self.row2)
        plt.show()

    def get_slope(self):
        # code for calculating slope
        return slope

    def other_methods(self,col1,col2,col3)
        # code for other stuff

使用self建立对要在类中使用的对象的引用。

修改：更多示例

假设您正在使用pandas，您可以将整个数据框（或其相关子集）传递给该类：

class Summary(object):
    def __init__(self, summary_df, data_df):
        self.summary = summary_df
        #look_up_table = {dict created}

        self.data_df = data_df

    # If it'll always be two columns
    def subset_df(self, some_col, another_col):
        # takes a vertical slice of the original df
        self.col1 = self.data_df[some_col]
        self.col2 = self.data_df[another_col]

        self.row1 = look_up_table[self.col1]
        self.row2 = look_up_table[self.col2]

        self.subset = self.summary.loc[[self.row1,self.row2]][[self.col1,self.col2]]

现在你可以用：

来调用它

do_summary = Summary(my_summary_df, my_data_df)
do_summary.subset_df('column_name1', 'column_name2')
print(do_summary.subset)

如何在Python中创建类时避免重复代码

2 个答案: