Question

我终于创建了一个以更简化的方式分析数据的类。它需要一个CSV文件并输出有关表及其列的一些信息。

class Analyses:
    def Types_des_colonnes(self, df):
        tcol = df.columns.to_series().groupby(df.dtypes).groups
        tycol = {k.name: v for k, v in tcol.items()}
        return(self.tycol)

    def Analyse_table(self, table):
        # Renvoi un dico 'tycol' avec les types en clef et les noms des colonnes en valeur:
        Types_des_colonnes(table)
        nbr_types_colonnes_diff=len(tycol.keys())


        type_table = table.dtypes
        liste_columns = table.columns
        clef_types= tycol.keys()
        long_table = len(table)
        nbr_cols = len(liste_columns)

        print(table.describe())

        print('Nombre de colonnes: '+ str(nbr_cols))
        print('Nombre de types de colonnes différentes: '+str(nbr_types_colonnes_diff))
        for kk in range(0,nbr_types_colonnes_diff):
            print('Type: ' + tycol.keys()[kk])
            print(tycol.values())
        return(liste_columns)

    def Analyse_colonne(self, col):
        from numpy import where, nan
        from pandas import isnull,core,DataFrame
        # Si col est un dataframe:
        if type(col) == core.frame.DataFrame:
            dict_col = {}
            for co in col.columns:
                dict_col_Loc = Analyse_colonne(col[co]);
                dict_col[co] = dict_col_Loc.values()
            return(dict_col)
        elif type(col) == core.series.Series:    
            type_col = type(col)
            arr_null = where(isnull(col))[0]
            type_data = col.dtype
            col_uniq = col.unique()

            nbr_unique= len(col_uniq)
            taille_col= len(col)
            nbr_ligne_vide= len(arr_null)

            top_entree= col.head()
            bottom_entree= col.tail()
            pct_uniq= (float(nbr_unique)/float(taille_col))*100.0
            pct_ligne_vide= (float(nbr_ligne_vide)/float(taille_col))*100.0
            print('\n')
            print('       #################      '+col.name+'      #################')
            print('Type des données: ' + str(type_data))
            print('Taille de la colonne: ' + str(taille_col))
            if nbr_unique == 1:
                print('Aucune entrée unique')
            else:
                print('Nombre d\'uniques: '+ str(nbr_unique))
                print('Pourcentage d\'uniques: '+str(pct_uniq)+' %')
            if nbr_ligne_vide == 0:
                print('Aucune ligne vide')
            else:
                print('Nombre de lignes vides: '+ str(nbr_ligne_vide))
                print('Pourcentage de lignes vides: '+str(pct_ligne_vide)+' %')

            dict_col = {}
            dict_col[col.name] = arr_null
            return(dict_col)
        else:
            print('Problem')

def main():
    anly = Analyses()
    anly.Analyse_table(df_AIS)

if __name__ == '__main__':
    main()

当我运行这个脚本时，我得到了一个：

NameError: name 'tycol' is not defined

指的是第二行：

def Analyse_table():
        # Renvoi un dico 'tycol' avec les types en clef et les noms des colonnes en valeur:
        Types_des_colonnes(table)
        nbr_types_colonnes_diff=len(tycol.keys())

我知道这与使用＆＃39; self＆＃39;正确的，但我真的不明白如何正确地做到这一点。有谁能告诉我如何解决这个非常简单的问题？

（此脚本中的所有＆＃39; self＆＃39;已被我添加，只是为了让它自己运行。）

Answer 1

通过位于.的右侧（如obj.member中）

，将Python对象的成员与其他变量区分开来

方法的第一个参数绑定到调用该方法的对象。按照惯例，此参数名为self，这不是技术要求。

tycol是一个普通变量，与Analyses对象完全无关。 self.tycol是不同的名称。

注意return self.tycol来自Types_des_colonnes的方式，而不给它任何值（应该提出AttributeError。您是否尝试在问题体中发布代码时运行代码？）。然后，您将在呼叫站点丢弃此值。

您应该将Types_des_colonnes的结果分配到Analyse_table中的名称，或者专门使用名称self.tycol。

def Types_des_colonnes(self, df):
    tcol = df.columns.to_series().groupby(df.dtypes).groups
        # we don't care about tcol after this, it ceases to exist when the method ends
    self.tycol = {k.name: v for k, v in tcol.items()}
        # but we do care about self.tycol

def Analyse_table(self, table):
    # Renvoi un dico 'tycol' avec les types en clef et les noms des colonnes en valeur:
    Types_des_colonnes(table)
    nbr_types_colonnes_diff = len(self.tycol.keys())
    # ...

Answer 2

在方法Types_de_colonnes中，您需要执行：self.tycol=tycol。此外，您需要将方法称为“方法”。花一周时间阅读一本关于python的书来学习一些基础知识。编程很简单，但不是那么容易：）

Answer 3

类是一种数据结构，包含“数据和对该数据进行操作的方法”。请注意，我没有说'函数'，因为类总是可以访问类中包含的数据，因此类中的方法不是数学意义上的“函数”。但是，也许是另一天。

那么，你什么时候使用self？ self表示要在其中调用方法的类的实际实例。因此，如果您有一个名为Shape的类和两个Shape a和b的实例，那么当您调用a.area()内的self对象时area方法引用名为Shape的{{1}}实例，当您调用a时，b.area()对象引用self b实例{1}}

通过这种方式，您可以编写适用于Shape的任何实例的方法。为了使这更具体，这是一个示例Shape类：

Shape

在这里，您可以看到class Shape(): def __init__(self, length_in, height_in): self.length = length_in self.height = height_in def area(self): return self.length * self.height类中包含的数据是长度和高度。这些值在Shape（在构造函数中，即__init__）中分配，并被指定为Shape a(3.0,4.0)的成员。然后，可以通过方法self通过area对象访问它们，以进行计算。也可以重新分配这些成员，并可以创建新成员。（通常虽然成员只在构造函数中创建）。

与Python设计的其他简单方面相比，这一切都非常奇怪。然而，这并非Python所独有。在C ++中有一个self指针，它用于相同的目的，而在JavaScript中，闭包用于创建对象的方式通常使用this变量来执行与Python this相同的任务。 }。

我希望这会有所帮助。我可以扩展您的任何其他问题。

此外，在文件顶部执行self语句通常是个好主意。有理由不这样做，但它们都不足以让普通的程序员使用。

正确使用＆＃39; self＆＃39;在python脚本中

3 个答案: