从函数修改熊猫数据框

时间:2019-04-09 15:29:27

标签: python pandas function dataframe

我发现自己试图一次又一次地修改几个数据框。我想将所有修改都放在一个函数中,然后仅使用数据框名称调用该函数并完成所有转换。

这是我现在尝试申请的代码和所有转换。当我运行它时,什么也没有发生,并且数据帧保持原始状态。

#create a preprocessing formula so the process can be applied to any dataset (traning and validation and competition)
def preprocessing(df):
    #inspect dataframe
    df.head()

    #check data types in dataframe
    np.unique(df.dtypes).tolist()

    #inspect shape before removing duplicates
    df.shape

    #drop duplicates
    df = df.drop_duplicates()

    #inspect shape again to see change
    df.shape

    #calculate rows that have a mean of 100 to remove them later
    mean100_rows = [i for i in range(len(df)) if df.iloc[i,0:520].values.mean() == 100 ]

    #calculate columns that have a mean of 100 to remove them later
    mean100_cols = [i for i in np.arange(0,520,1) if df.iloc[:,i].values.mean() == 100 ]

    #calculate columns labels that have a mean of 100 to remove them later
    col_labels = [df.columns[i] for i in mean100_cols]

    #delete rows with mean 100
    df.drop(index = mean100_rows, axis=0, inplace=True)

    #delete columns with mean 100
    df.drop(columns=col_labels, axis=1, inplace=True)

    #export columns that have been removed
    pd.Series(col_labels).to_csv('remove_cols.csv')

    #head
    df.head()

    #check size again
    df.shape

1 个答案:

答案 0 :(得分:1)

在通过引用传递给函数的Python对象中。

执行以下行时

df = df.drop_duplicates()

您基本上为函数参数分配了新的引用,但函数外部的对象不变。

我建议更改函数,以便它将返回df对象,然后将其返回值分配给函数外部的df对象。