使用alp - 主动学习python框架

时间:2017-10-22 15:28:00

标签: python machine-learning

我正在研究一个使用主动学习的机器学习研究项目。我正在尝试使用alp,它提供了主流主动学习技术的实现。

但是,我对提供的示例感到有些困惑。第一个例子是:

gci -path C:\Folder\Path\* -Filter *.pdf | ? { $_.PsIsContainer -and $_.Fullname -notmatch '_' }

我不确定X,X_unlabeled,y&中的数据应该是什么y_oracle。应:

  • from active_learning.active_learning import ActiveLearner from sklearn.model_selection import train_test_split from sklearn.linear_model import LogisticRegression from sklearn.datasets import make_classification X, X_unlabeled, y, y_oracle = train_test_split(*make_classification()) clf = LogisticRegression().fit(X, y) AL = ActiveLearner(strategy='entropy') AL.rank(clf, X_unlabeled, num_queries=5) 包含所有未标记的数据,或包含已标记和未标记的数据。
  • 'y'是一个空列表或包含训练数据的标签。
  • X仅包含培训数据的标签

1 个答案:

答案 0 :(得分:1)

>>> help(make_classification)

make_classification(n_samples=100, n_features=20, n_informative=2, n_redundant=2,
                    n_repeated=0, n_classes=2, n_clusters_per_class=2, weights=None, 
                    flip_y=0.01, class_sep=1.0, hypercube=True, shift=0.0, scale=1.0, 
                    shuffle=True, random_state=None)

    Generate a random n-class classification problem.

    This initially creates clusters of points normally distributed (std=1)
    about vertices of a `2 * class_sep`-sided hypercube, and assigns an equal
    number of clusters to each class. It introduces interdependence between
    these features and adds various types of further noise to the data.

大胆强调我的。该功能基本上生成虚拟数据供您使用。此外,根据help,返回值为:

Returns
-------
X : array of shape [n_samples, n_features]
    The generated samples.

y : array of shape [n_samples]
    The integer labels for class membership of each sample.

样本和标签,然后传递给train_test_split,然后随机播放并返回列车和测试数据。