我正在研究一个使用主动学习的机器学习研究项目。我正在尝试使用alp,它提供了主流主动学习技术的实现。
但是,我对提供的示例感到有些困惑。第一个例子是:
gci -path C:\Folder\Path\* -Filter *.pdf | ? { $_.PsIsContainer -and $_.Fullname -notmatch '_' }
我不确定X,X_unlabeled,y&中的数据应该是什么y_oracle。应:
from active_learning.active_learning import ActiveLearner
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import make_classification
X, X_unlabeled, y, y_oracle = train_test_split(*make_classification())
clf = LogisticRegression().fit(X, y)
AL = ActiveLearner(strategy='entropy')
AL.rank(clf, X_unlabeled, num_queries=5)
包含所有未标记的数据,或包含已标记和未标记的数据。X
仅包含培训数据的标签答案 0 :(得分:1)
>>> help(make_classification)
make_classification(n_samples=100, n_features=20, n_informative=2, n_redundant=2,
n_repeated=0, n_classes=2, n_clusters_per_class=2, weights=None,
flip_y=0.01, class_sep=1.0, hypercube=True, shift=0.0, scale=1.0,
shuffle=True, random_state=None)
Generate a random n-class classification problem.
This initially creates clusters of points normally distributed (std=1)
about vertices of a `2 * class_sep`-sided hypercube, and assigns an equal
number of clusters to each class. It introduces interdependence between
these features and adds various types of further noise to the data.
大胆强调我的。该功能基本上生成虚拟数据供您使用。此外,根据help
,返回值为:
Returns
-------
X : array of shape [n_samples, n_features]
The generated samples.
y : array of shape [n_samples]
The integer labels for class membership of each sample.
样本和标签,然后传递给train_test_split
,然后随机播放并返回列车和测试数据。