如何在MATLAB中加载MNIST数字和标签数据?

时间:2016-09-19 19:45:24

标签: image matlab image-processing mnist

我正在尝试运行链接

中给出的代码

https://github.com/bd622/DiscretHashing

Discrete Hashing是一种降维的方法,用于近似最近邻搜索。我想加载http://yann.lecun.com/exdb/mnist/中可用的MNIST数据库的实现。我从压缩的gz格式中提取了文件。

问题1:

使用该解决方案读取Reading MNIST Image Database binary file in MATLAB

中提供的MNIST数据库

我收到以下错误:

Error using fread
Invalid file identifier.  Use fopen to generate a valid file identifier.

Error in Reading (line 7)
A = fread(fid, 1, 'uint32');

以下是代码:

clear all;
close all;

%//Open file
fid = fopen('t10k-images-idx3-ubyte', 'r');

A = fread(fid, 1, 'uint32');
magicNumber = swapbytes(uint32(A));

%//For each image, store into an individual cell
imageCellArray = cell(1, totalImages);
for k = 1 : totalImages
    %//Read in numRows*numCols pixels at a time
    A = fread(fid, numRows*numCols, 'uint8');
    %//Reshape so that it becomes a matrix
    %//We are actually reading this in column major format
    %//so we need to transpose this at the end
    imageCellArray{k} = reshape(uint8(A), numCols, numRows)';
end

%//Close the file
fclose(fid);

更新:问题1已解决且修订后的代码为

clear all;
close all;

%//Open file
fid = fopen('t10k-images.idx3-ubyte', 'r');

A = fread(fid, 1, 'uint32');
magicNumber = swapbytes(uint32(A));

%//Read in total number of images
%//A = fread(fid, 4, 'uint8');
%//totalImages = sum(bitshift(A', [24 16 8 0]));

%//OR
A = fread(fid, 1, 'uint32');
totalImages = swapbytes(uint32(A));

%//Read in number of rows
%//A = fread(fid, 4, 'uint8');
%//numRows = sum(bitshift(A', [24 16 8 0]));

%//OR
A = fread(fid, 1, 'uint32');
numRows = swapbytes(uint32(A));

%//Read in number of columns
%//A = fread(fid, 4, 'uint8');
%//numCols = sum(bitshift(A', [24 16 8 0]));

%// OR
A = fread(fid, 1, 'uint32');
numCols = swapbytes(uint32(A));

for k = 1 : totalImages
    %//Read in numRows*numCols pixels at a time
    A = fread(fid, numRows*numCols, 'uint8');
    %//Reshape so that it becomes a matrix
    %//We are actually reading this in column major format
    %//so we need to transpose this at the end
    imageCellArray{k} = reshape(uint8(A), numCols, numRows)';
end

%//Close the file
fclose(fid);

问题2:

我无法理解如何在代码中应用MNIST的4个文件。代码包含变量

traindata = double(traindata);
testdata = double(testdata);

如何准备MNIST数据库以便我可以申请实施?

更新:我实施了解决方案,但我一直收到此错误

Error using fread
Invalid file identifier.  Use fopen to generate a valid file identifier.

Error in mnist_parse (line 11)
A = fread(fid1, 1, 'uint32');

这些是文件

demo.m%这是调用函数读取MNIST数据的主文件

clear all
clc
[Trainimages, Trainlabels] = mnist_parse('C:\Users\Desktop\MNIST\train-images-idx3-ubyte', 'C:\Users\Desktop\MNIST\train-labels-idx1-ubyte');

[Testimages, Testlabels] = mnist_parse('t10k-images-idx3-ubyte', 't10k-labels-idx1-ubyte');

k=5;
digit = images(:,:,k);
lbl = label(k);
 function [images, labels] = mnist_parse(path_to_digits, path_to_labels)

% Open files
fid1 = fopen(path_to_digits, 'r');

% The labels file
fid2 = fopen(path_to_labels, 'r');

% Read in magic numbers for both files
A = fread(fid1, 1, 'uint32');
magicNumber1 = swapbytes(uint32(A)); % Should be 2051
fprintf('Magic Number - Images: %d\n', magicNumber1);

A = fread(fid2, 1, 'uint32');
magicNumber2 = swapbytes(uint32(A)); % Should be 2049
fprintf('Magic Number - Labels: %d\n', magicNumber2);

% Read in total number of images
% Ensure that this number matches with the labels file
A = fread(fid1, 1, 'uint32');
totalImages = swapbytes(uint32(A));
A = fread(fid2, 1, 'uint32');
if totalImages ~= swapbytes(uint32(A))
    error('Total number of images read from images and labels files are not the same');
end
fprintf('Total number of images: %d\n', totalImages);

% Read in number of rows
A = fread(fid1, 1, 'uint32');
numRows = swapbytes(uint32(A));

% Read in number of columns
A = fread(fid1, 1, 'uint32');
numCols = swapbytes(uint32(A));

fprintf('Dimensions of each digit: %d x %d\n', numRows, numCols);

% For each image, store into an individual slice
images = zeros(numRows, numCols, totalImages, 'uint8');
for k = 1 : totalImages
    % Read in numRows*numCols pixels at a time
    A = fread(fid1, numRows*numCols, 'uint8');

    % Reshape so that it becomes a matrix
    % We are actually reading this in column major format
    % so we need to transpose this at the end
    images(:,:,k) = reshape(uint8(A), numCols, numRows).';
end

% Read in the labels
labels = fread(fid2, totalImages, 'uint8');

% Close the files
fclose(fid1);
fclose(fid2);

end

1 个答案:

答案 0 :(得分:3)

我是您谈到的方法#1的原作者。读入训练数据和测试标签的过程非常简单。在读取图像方面,您在上面显示的代码完美地读取文件并采用单元格数组格式。但是,您缺少文件内图像,行和列的读数。请注意,此文件的MNIST格式采用以下方式。左列是您相对于开头引用的字节偏移量:

[offset] [type]          [value]          [description]
0000     32 bit integer  0x00000803(2051) magic number
0004     32 bit integer  60000            number of images
0008     32 bit integer  28               number of rows
0012     32 bit integer  28               number of columns
0016     unsigned byte   ??               pixel
0017     unsigned byte   ??               pixel
........
xxxx     unsigned byte   ??               pixel

前四个字节是一个幻数:2051,以确保您正确读取文件。接下来的四个字节表示图像的总数,接下来的四个字节是行,最后四个字节是列。应该有60000张图像,大小为28行×28列。在此之后,像素以行主格式交错,因此您必须循环一系列28 x 28像素并存储它们。在这种情况下,我将它们存储在一个单元格数组中,并且该单元格数组中的每个元素都是一位数。相同的格式也适用于测试数据,但是有10000个图像。

至于实际标签,它的格式大致相同,但存在一些细微差别:

[offset] [type]          [value]          [description]
0000     32 bit integer  0x00000801(2049) magic number (MSB first)
0004     32 bit integer  60000            number of items
0008     unsigned byte   ??               label
0009     unsigned byte   ??               label
........
xxxx     unsigned byte   ??               label

前四个字节是幻数:2049,然后第二组四个字节告诉您有多少个标签,最后数据集中每个对应的数字正好有1个字节。测试数据也是相同的格式,但有10000个标签。因此,一旦您读入标签集中的必要数据,您只需要进行一次fread调用,并确保数据是无符号的8位整数,以便在其余标签中读取。

现在你必须使用swapbytes的原因是因为MATLAB将以little-endian格式读入数据,这意味着读取一组字节中的最低有效字节在第一。完成后,您可以使用swapbytes重新排列此顺序。

因此,我已经为你修改了这段代码,这是一个实际的函数,它接受一组两个字符串:数字图像文件的完整路径和数字的完整路径。我还更改了代码,使图像是3D数字矩阵,而不是单元格数组,以便更快地处理。请注意,当您开始读取实际图像数据时,每个像素都是无符号的8位整数,因此无需进行任何字节交换。只有在一次fread调用中读取多个字节时才需要这样做:

function [images, labels] = mnist_parse(path_to_digits, path_to_labels)

% Open files
fid1 = fopen(path_to_digits, 'r');

% The labels file
fid2 = fopen(path_to_labels, 'r');

% Read in magic numbers for both files
A = fread(fid1, 1, 'uint32');
magicNumber1 = swapbytes(uint32(A)); % Should be 2051
fprintf('Magic Number - Images: %d\n', magicNumber1);

A = fread(fid2, 1, 'uint32');
magicNumber2 = swapbytes(uint32(A)); % Should be 2049
fprintf('Magic Number - Labels: %d\n', magicNumber2);

% Read in total number of images
% Ensure that this number matches with the labels file
A = fread(fid1, 1, 'uint32');
totalImages = swapbytes(uint32(A));
A = fread(fid2, 1, 'uint32');
if totalImages ~= swapbytes(uint32(A))
    error('Total number of images read from images and labels files are not the same');
end
fprintf('Total number of images: %d\n', totalImages);

% Read in number of rows
A = fread(fid1, 1, 'uint32');
numRows = swapbytes(uint32(A));

% Read in number of columns
A = fread(fid1, 1, 'uint32');
numCols = swapbytes(uint32(A));

fprintf('Dimensions of each digit: %d x %d\n', numRows, numCols);

% For each image, store into an individual slice
images = zeros(numRows, numCols, totalImages, 'uint8');
for k = 1 : totalImages
    % Read in numRows*numCols pixels at a time
    A = fread(fid1, numRows*numCols, 'uint8');

    % Reshape so that it becomes a matrix
    % We are actually reading this in column major format
    % so we need to transpose this at the end
    images(:,:,k) = reshape(uint8(A), numCols, numRows).';
end

% Read in the labels
labels = fread(fid2, totalImages, 'uint8');

% Close the files
fclose(fid1);
fclose(fid2);

end

要调用此函数,只需指定图像文件和标签文件的路径即可。假设您在文件所在的同一目录中运行此文件,则可以对训练图像执行以下操作之一:

[images, labels] = mnist_parse('train-images-idx3-ubyte', 'train-labels-idx1-ubyte');

另外,您将对测试图像执行以下操作:

[images, labels] = mnist_parse('t10k-images-idx3-ubyte', 't10k-labels-idx1-ubyte');

要访问k个数字,您只需执行以下操作:

digit = images(:,:,k);

k数字的相应标签是:

lbl = label(k);

为了最终将此数据转换为我在Github上看到的代码可接受的格式,他们假设行对应于训练示例,列对应于要素。如果您希望使用此格式,只需重塑数据,以便图像像素分布在列上。

因此,只需这样做:

[trainingdata, traingnd] = mnist_parse('train-images-idx3-ubyte', 'train-labels-idx1-ubyte');
trainingdata = double(reshape(trainingdata, size(trainingdata,1)*size(trainingdata,2), []).');
traingnd = double(traingnd);

[testdata, testgnd] = mnist_parse('t10k-images-idx3-ubyte', 't10k-labels-idx1-ubyte');
testdata = double(reshape(testdata, size(testdata,1)*size(testdata_data,2), []).');
testgnd = double(testgnd);

以上使用与脚本中相同的变量,因此您应该能够插入它并且它应该可以工作。第二行重新整形矩阵,使每个数字都在一列中,但我们需要调整它以使每个数字都在一列中。我们还需要转换为double,因为Github代码正在做什么。相同的逻辑应用于测试数据。另请注意,我已将训练和测试标签明确地投射到double,以确保您决定在此数据上使用的任何算法的最大兼容性。

快乐的数字黑客!