在PyOD库中使用sklearn DistanceMetrics中的马哈拉诺比斯距离遇到问题

时间:2019-04-17 13:27:25

标签: python arrays scikit-learn knn mahalanobis

在使用PyOD的KNN algorithm时,我尝试使用马氏距离度量,同时将协方差矩阵作为metric_params参数提供,如文档中所述。我不确定问题出在哪里,但是fit方法无法正确检测到协方差矩阵。任何帮助将不胜感激。

我已经看过How to use mahalanobis distance in sklearn DistanceMetrics?

以下是一些详细信息。

我已经使用PyOD的实用程序功能生成了一些模拟数据,然后将其转换为pandas DataFrame:

from pyod.utils.data import generate_data

outlier_fraction = 0.1

# generate random data with two features with 10% outliers
X_train, y_train = generate_data(n_train=200,train_only=True, n_features=2, contamination=outlier_fraction)

X_train_df = pd.DataFrame(X_train, columns = ['F1', 'F2'])
X_train_df.head()

然后我可以轻松地获得协方差矩阵,如下所示:

X_train_cov_df = X_train_df.cov()
X_train_cov_df

对应的NumPy数组为:X_train_df.cov().values

然后我实例化KNN并调用fit方法,如下所示:

from pyod.models.knn import KNN

outlier_fraction = 0.1
clf = KNN(contamination = outlier_fraction, algorithm='auto', metric='mahalanobis', metric_params = {'V' : X_train_df.cov().values})

# fit the dataset to the model
clf.fit(X_train_df)

我希望上面的代码片段可以创建一个拟合模型。 但是,收到的错误是:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-116-e4a09fa074e5> in <module>
      5 
      6 # fit the dataset to the model
----> 7 clf.fit(X_train_df)

~/.virtualenvs/work/lib/python3.6/site-packages/pyod/models/knn.py in fit(self, X, y)
    171         self._set_n_classes(y)
    172 
--> 173         self.tree_ = KDTree(X, leaf_size=self.leaf_size, metric=self.metric)
    174         self.neigh_.fit(X)
    175 

sklearn/neighbors/binary_tree.pxi in sklearn.neighbors.kd_tree.BinaryTree.__init__()

sklearn/neighbors/dist_metrics.pyx in sklearn.neighbors.dist_metrics.DistanceMetric.get_metric()

sklearn/neighbors/dist_metrics.pyx in sklearn.neighbors.dist_metrics.MahalanobisDistance.__init__()

ValueError: Must provide either V or VI for Mahalanobis distance

0 个答案:

没有答案