查找并更换熊猫?

时间:2018-03-29 10:21:54

标签: python pandas scikit-learn

我正在对包含数字列的数据帧执行min-max-scaler操作,但如果在这些数字列中,如果任何单元格包含字符串或空值,那么我将获得异常。 为了避免这种情况,我认为将字符串或空单元格转换为0。 怎么做? 我的职责:

def min_max_scaler(df_sub,col_names):
"""
import the following:
from sklearn import preprocessing
from sklearn.preprocessing import StandardScaler

df_sub    : Expecting a subset of data frame in which every columns should be number fields
        (It contains all the columns on which you want to perform the operation)
example   : df_subset = df.filter(['latitude','longitude','order.id'], axis=1)
col_names : All column names of the subset
"""
    scaler = preprocessing.MinMaxScaler()
    scaled_df = scaler.fit_transform(df_sub)
    scaled_df = pd.DataFrame(scaled_df, columns=col_names)

    return scaled_df

数据集:

day phone_calls received
7       180      NaN
8       8        NaN
9     -240       qbb

如何在执行此功能之前进行验证。请帮助。

1 个答案:

答案 0 :(得分:3)

我这样做:

找到object dtype的列:

obj_cols = df[col_names].columns[df[col_names].dtypes.eq('object')]

将它们转换为数字dtypes,将NaN替换为0(零):

df[obj_cols] = df[obj_cols].apply(pd.to_numeric, errors='coerce').fillna(0)

规模:

df[obj_cols] = scaler.fit_transform(df[obj_cols])

作为一个功能:

def min_max_scaler(df_sub,col_names):
    scaler = preprocessing.MinMaxScaler()
    obj_cols = df_sub[col_names].columns[df_sub[col_names].dtypes.eq('object')]
    df_sub[obj_cols] = df_sub[obj_cols].apply(pd.to_numeric, errors='coerce').fillna(0)

    return df_sub