Question

我正在研究一个使用主动学习的机器学习研究项目。我正在尝试使用alp，它提供了主流主动学习技术的实现。

但是，我对提供的示例感到有些困惑。第一个例子是：

gci -path C:\Folder\Path\* -Filter *.pdf | ? { $_.PsIsContainer -and $_.Fullname -notmatch '_' }

我不确定X，X_unlabeled，y＆amp;中的数据应该是什么y_oracle。应：

from active_learning.active_learning import ActiveLearner from sklearn.model_selection import train_test_split from sklearn.linear_model import LogisticRegression from sklearn.datasets import make_classification X, X_unlabeled, y, y_oracle = train_test_split(*make_classification()) clf = LogisticRegression().fit(X, y) AL = ActiveLearner(strategy='entropy') AL.rank(clf, X_unlabeled, num_queries=5)包含所有未标记的数据，或包含已标记和未标记的数据。
'y'是一个空列表或包含训练数据的标签。
X仅包含培训数据的标签

Answer 1

>>> help(make_classification)

make_classification(n_samples=100, n_features=20, n_informative=2, n_redundant=2,
                    n_repeated=0, n_classes=2, n_clusters_per_class=2, weights=None, 
                    flip_y=0.01, class_sep=1.0, hypercube=True, shift=0.0, scale=1.0, 
                    shuffle=True, random_state=None)

    Generate a random n-class classification problem.

    This initially creates clusters of points normally distributed (std=1)
    about vertices of a `2 * class_sep`-sided hypercube, and assigns an equal
    number of clusters to each class. It introduces interdependence between
    these features and adds various types of further noise to the data.

大胆强调我的。该功能基本上生成虚拟数据供您使用。此外，根据help，返回值为：

Returns
-------
X : array of shape [n_samples, n_features]
    The generated samples.

y : array of shape [n_samples]
    The integer labels for class membership of each sample.

样本和标签，然后传递给train_test_split，然后随机播放并返回列车和测试数据。

使用alp - 主动学习python框架

1 个答案: