Question

我正在对包含数字列的数据帧执行min-max-scaler操作，但如果在这些数字列中，如果任何单元格包含字符串或空值，那么我将获得异常。为了避免这种情况，我认为将字符串或空单元格转换为0。怎么做？我的职责：

def min_max_scaler(df_sub,col_names):
"""
import the following:
from sklearn import preprocessing
from sklearn.preprocessing import StandardScaler

df_sub    : Expecting a subset of data frame in which every columns should be number fields
        (It contains all the columns on which you want to perform the operation)
example   : df_subset = df.filter(['latitude','longitude','order.id'], axis=1)
col_names : All column names of the subset
"""
    scaler = preprocessing.MinMaxScaler()
    scaled_df = scaler.fit_transform(df_sub)
    scaled_df = pd.DataFrame(scaled_df, columns=col_names)

    return scaled_df

数据集：

day phone_calls received
7       180      NaN
8       8        NaN
9     -240       qbb

如何在执行此功能之前进行验证。请帮助。

Answer 1

我这样做：

找到object dtype的列：

obj_cols = df[col_names].columns[df[col_names].dtypes.eq('object')]

将它们转换为数字dtypes，将NaN替换为0（零）：

df[obj_cols] = df[obj_cols].apply(pd.to_numeric, errors='coerce').fillna(0)

规模：

df[obj_cols] = scaler.fit_transform(df[obj_cols])

作为一个功能：

def min_max_scaler(df_sub,col_names):
    scaler = preprocessing.MinMaxScaler()
    obj_cols = df_sub[col_names].columns[df_sub[col_names].dtypes.eq('object')]
    df_sub[obj_cols] = df_sub[obj_cols].apply(pd.to_numeric, errors='coerce').fillna(0)

    return df_sub

查找并更换熊猫？

1 个答案: