在使用PyOD的KNN algorithm时,我尝试使用马氏距离度量,同时将协方差矩阵作为metric_params
参数提供,如文档中所述。我不确定问题出在哪里,但是fit
方法无法正确检测到协方差矩阵。任何帮助将不胜感激。
我已经看过How to use mahalanobis distance in sklearn DistanceMetrics?
以下是一些详细信息。
我已经使用PyOD的实用程序功能生成了一些模拟数据,然后将其转换为pandas DataFrame:
from pyod.utils.data import generate_data
outlier_fraction = 0.1
# generate random data with two features with 10% outliers
X_train, y_train = generate_data(n_train=200,train_only=True, n_features=2, contamination=outlier_fraction)
X_train_df = pd.DataFrame(X_train, columns = ['F1', 'F2'])
X_train_df.head()
然后我可以轻松地获得协方差矩阵,如下所示:
X_train_cov_df = X_train_df.cov()
X_train_cov_df
对应的NumPy数组为:X_train_df.cov().values
。
然后我实例化KNN并调用fit方法,如下所示:
from pyod.models.knn import KNN
outlier_fraction = 0.1
clf = KNN(contamination = outlier_fraction, algorithm='auto', metric='mahalanobis', metric_params = {'V' : X_train_df.cov().values})
# fit the dataset to the model
clf.fit(X_train_df)
我希望上面的代码片段可以创建一个拟合模型。 但是,收到的错误是:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-116-e4a09fa074e5> in <module>
5
6 # fit the dataset to the model
----> 7 clf.fit(X_train_df)
~/.virtualenvs/work/lib/python3.6/site-packages/pyod/models/knn.py in fit(self, X, y)
171 self._set_n_classes(y)
172
--> 173 self.tree_ = KDTree(X, leaf_size=self.leaf_size, metric=self.metric)
174 self.neigh_.fit(X)
175
sklearn/neighbors/binary_tree.pxi in sklearn.neighbors.kd_tree.BinaryTree.__init__()
sklearn/neighbors/dist_metrics.pyx in sklearn.neighbors.dist_metrics.DistanceMetric.get_metric()
sklearn/neighbors/dist_metrics.pyx in sklearn.neighbors.dist_metrics.MahalanobisDistance.__init__()
ValueError: Must provide either V or VI for Mahalanobis distance