我的任务是首先从数据计算距离矩阵,然后使用距离矩阵作为聚类算法的输入。我需要在使用之前将距离矩阵归一化为 0~1,但在选择合适的方法时遇到问题。据我所知,Z-score 和 Min-Max 都是两种流行的归一化方法,您会建议哪一种用于聚类任务?
答案 0 :(得分:0)
您肯定可以对数据进行某种特征缩放。
# Normalization
from sklearn.model_selection import train_test_split
X = df
y = target
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.2,random_state=27)
# data normalization with sklearn
from sklearn.preprocessing import MinMaxScaler
# fit scaler on training data
norm = MinMaxScaler().fit(X_train)
# transform training data
X_train_norm = norm.transform(X_train)
# transform testing dataabs
X_test_norm = norm.transform(X_test)
或者...
# data standardization with sklearn
from sklearn.preprocessing import StandardScaler
# copy of datasets
X_train_stand = X_train.copy()
X_test_stand = X_test.copy()
# numerical features
num_cols = ['Item_Weight','Item_Visibility','Item_MRP','Outlet_Establishment_Year']
# apply standardization on numerical features
for i in num_cols:
# fit on training data column
scale = StandardScaler().fit(X_train_stand[[i]])
# transform the training data column
X_train_stand[i] = scale.transform(X_train_stand[[i]])
# transform the testing data column
X_test_stand[i] = scale.transform(X_test_stand[[i]])
有关详细信息,请参阅下面的链接。