我发现自己试图一次又一次地修改几个数据框。我想将所有修改都放在一个函数中,然后仅使用数据框名称调用该函数并完成所有转换。
这是我现在尝试申请的代码和所有转换。当我运行它时,什么也没有发生,并且数据帧保持原始状态。
#create a preprocessing formula so the process can be applied to any dataset (traning and validation and competition)
def preprocessing(df):
#inspect dataframe
df.head()
#check data types in dataframe
np.unique(df.dtypes).tolist()
#inspect shape before removing duplicates
df.shape
#drop duplicates
df = df.drop_duplicates()
#inspect shape again to see change
df.shape
#calculate rows that have a mean of 100 to remove them later
mean100_rows = [i for i in range(len(df)) if df.iloc[i,0:520].values.mean() == 100 ]
#calculate columns that have a mean of 100 to remove them later
mean100_cols = [i for i in np.arange(0,520,1) if df.iloc[:,i].values.mean() == 100 ]
#calculate columns labels that have a mean of 100 to remove them later
col_labels = [df.columns[i] for i in mean100_cols]
#delete rows with mean 100
df.drop(index = mean100_rows, axis=0, inplace=True)
#delete columns with mean 100
df.drop(columns=col_labels, axis=1, inplace=True)
#export columns that have been removed
pd.Series(col_labels).to_csv('remove_cols.csv')
#head
df.head()
#check size again
df.shape
答案 0 :(得分:1)
在通过引用传递给函数的Python对象中。
执行以下行时
df = df.drop_duplicates()
您基本上为函数参数分配了新的引用,但函数外部的对象不变。
我建议更改函数,以便它将返回df对象,然后将其返回值分配给函数外部的df对象。