从用不同数量的空格分隔的文件导入数据

时间:2014-10-09 04:46:10

标签: matlab file parsing import

我试图将this数据集读入单元格数组但有两个问题

1)分隔符是每列不同的空格

2)第4列中的6个条目有问号而不是数字

从文件中将此数据读入单元格数组的好方法是什么?

2 个答案:

答案 0 :(得分:1)

尝试以下方法:

x = importdata('auto-mpg.data'); %// read lines
y = cell(numel(x),9); %// preallocate with 9 cols (acccording to your file)
for n = 1:numel(x)
    y(n,:) = regexp(x{n}, '(\s\s+)|\t', 'split'); %// split each line into 
    %// columns using as separator either more than one space or a tab
    %//(according to your file)
end

结果是398x9字符串y的单元格数组。

答案 1 :(得分:0)

以下是基于MATLAB导入工具的代码:

% Initialize variables.
filename = '/home/gknor/Pulpit/auto-mpg.data';
delimiter = {'\t',' '};

% Read columns of data as strings:
formatSpec = '%s%s%s%s%s%s%s%s%[^\n\r]';

% Open the text file.
fileID = fopen(filename,'r');

% Read columns of data according to format string.
dataArray = textscan(fileID, formatSpec, 'Delimiter', delimiter, 'MultipleDelimsAsOne', true,  'ReturnOnError', false);

% Close the text file.
fclose(fileID);

% Convert the contents of columns containing numeric strings to numbers.
% Replace non-numeric strings with NaN.
raw = repmat({''},length(dataArray{1}),length(dataArray)-1);
for col=1:length(dataArray)-1
    raw(1:length(dataArray{col}),col) = dataArray{col};
end
numericData = NaN(size(dataArray{1},1),size(dataArray,2));

for col=[1,2,3,4,5,6,7,8]
    % Converts strings in the input cell array to numbers. Replaced non-numeric
    % strings with NaN.
    rawData = dataArray{col};
    for row=1:size(rawData, 1);
        % Create a regular expression to detect and remove non-numeric prefixes and
        % suffixes.
        regexstr = '(?<prefix>.*?)(?<numbers>([-]*(\d+[\,]*)+[\.]{0,1}\d*[eEdD]{0,1}[-+]*\d*[i]{0,1})|([-]*(\d+[\,]*)*[\.]{1,1}\d+[eEdD]{0,1}[-+]*\d*[i]{0,1}))(?<suffix>.*)';
        try
            result = regexp(rawData{row}, regexstr, 'names');
            numbers = result.numbers;

            % Detected commas in non-thousand locations.
            invalidThousandsSeparator = false;
            if any(numbers==',');
                thousandsRegExp = '^\d+?(\,\d{3})*\.{0,1}\d*$';
                if isempty(regexp(thousandsRegExp, ',', 'once'));
                    numbers = NaN;
                    invalidThousandsSeparator = true;
                end
            end
            % Convert numeric strings to numbers.
            if ~invalidThousandsSeparator;
                numbers = textscan(strrep(numbers, ',', ''), '%f');
                numericData(row, col) = numbers{1};
                raw{row, col} = numbers{1};
            end
        catch me
        end
    end
end

% Replace non-numeric cells with NaN
R = cellfun(@(x) ~isnumeric(x) && ~islogical(x),raw); % Find non-numeric cells
raw(R) = {NaN}; % Replace non-numeric cells
data = cat(2,raw,dataArray{9});

% Clear temporary variables
clearvars -except data

有关导入工具的更多信息,您可以找到here