我进行了半监督学习来标记数据集中未标记的图像。通过使用未标记的图像作为输入,CNN 模型将在 softmax 计算后产生一个概率指数。如果值超过某个数字(例如 0.65),我将标记图像并将其添加到训练集中。 获取 persudo-dataset 的代码:
def get_pseudo_labels(trainset, dataset, model, threshold=0.65):
# This functions generates pseudo-labels of a dataset using given model.
# It returns an instance of DatasetFolder containing images whose prediction confidences exceed a given threshold.
device = "cuda" if torch.cuda.is_available() else "cpu"
# Construct a data loader.
data_loader = DataLoader(dataset, batch_size=batch_size, shuffle=False)
# The dataset is unlabelled image
# Make sure the model is in eval mode.
model.eval()
# Define softmax function.
softmax = nn.Softmax(dim=-1)
# Iterate over the dataset by batches.
for batch in tqdm(data_loader):
img, labels = batch
# Forward the data
# Using torch.no_grad() accelerates the forward process.
with torch.no_grad():
logits = model(img.to(device))
# Obtain the probability distributions by applying softmax on logits.
probs = softmax(logits)
# calculate probs
for j in range(0, batch_size):
for i in range(0, 11):
if probs[j][i].item() > threshold:
batch[1][j] = torch.Tensor([i]) # Label the imgae
temp = batch[0][j] + batch[1][j] # contact two tensor
trainset = ConcatDataset([trainset, temp]) # add this labelled image into trainset
model.train()
return trainset
编译器提醒我:
<块引用>如果 probs[j][i].item() > 阈值:
<块引用>IndexError: 索引 2 超出维度 0 和大小 2 的范围
但是,我可以正常打印问题。
for j in range(0, batch_size):
for i in range(0, 11):
print('batch:', j)
print('The value of label', i)
print(probs[j][i])
if probs[j][i].item() > threshold:
batch[1][j] = torch.Tensor([i])
temp = batch[0][j] + batch[1][j]
trainset = ConcatDataset([trainset, temp])
输出:
...
batch: 63
The value of label 9
tensor(0.0859, device='cuda:0')
batch: 63
The value of label 10
tensor(0.0977, device='cuda:0')
我不知道 IndexError 是什么意思....
img格式为:
tensor([...(img)],[...(label)])