Question

我正在尝试使用Octave读取包含数字和字符串的文本文件。文件格式如下：

A B C
a 10 100
b 20 200
c 30 300
d 40 400
e 50 500

但分隔符可以是空格，制表符，逗号或分号。如果分隔符是空格/制表符，则textread函数可以正常工作：

[A,B,C] = textread ('test.dat','%s %d %d','headerlines',1)

但是，如果分隔符是逗号/分号，则它不起作用。我尝试使用dklmread：

dlmread ('test.dat',';',1,0)

但它不起作用，因为第一列是一个字符串。基本上，使用textread我无法指定分隔符，而使用dlmread我无法指定第一列的格式。至少在Octave中没有这些功能的版本。有没有人以前遇到过这个问题？

Answer 1

textread允许您指定分隔符 - 它尊重strread的属性参数。以下代码对我有用：

[A,B,C] = textread( 'test.dat', '%s %d %d' ,'delimiter' , ',' ,1 )

Answer 2

我目前在Octave找不到一个简单的方法。您可以使用fopen()遍历文件并手动提取数据。我写了一个函数来对任意数据执行此操作：

function varargout = coltextread(fname, delim)

    % Initialize the variable output argument
    varargout = cell(nargout, 1);

    % Initialize elements of the cell array to nested cell arrays
    % This syntax is due to {:} producing a comma-separated 
    [varargout{:}] = deal(cell());

    fid = fopen(fname, 'r');

    while true
        % Get the current line
        ln = fgetl(fid);

        % Stop if EOF
        if ln == -1
            break;
        endif

        % Split the line string into components and parse numbers
        elems = strsplit(ln, delim);
        nums = str2double(elems);

        nans = isnan(nums);

        % Special case of all strings (header line)
        if all(nans)
            continue;
        endif

        % Find the indices of the NaNs 
        % (i.e. the indices of the strings in the original data)
        idxnans = find(nans);

        % Assign each corresponding element in the current line
        % into the corresponding cell array of varargout
        for i = 1:nargout
            % Detect if the current index is a string or a num
            if any(ismember(idxnans, i))
                varargout{i}{end+1} = elems{i};
            else
                varargout{i}{end+1} = nums(i);
            endif
        endfor
    endwhile

endfunction

它接受两个参数：文件名和分隔符。该函数由指定的返回变量的数量控制，因此，例如，[A B C] = coltextread('data.txt', ';');将尝试解析文件中每行的三个不同数据元素，而A = coltextread('data.txt', ';');将仅解析第一个元素。如果没有给出返回变量，那么该函数将不返回任何内容。

该函数忽略具有全字符串的行（例如'A B C'标题）。如果您想要一切，请删除if all(nans)...部分。

默认情况下，'columns'作为单元格数组返回，但中的数字实际上是转换后的数字，而不是字符串。如果您知道单元格数组只包含数字，那么您可以使用cell2mat(A)'轻松将其转换为列向量。

如何使用Octave读取带字符串/数字的分隔文件？

2 个答案: