Question

我正在尝试读取纯文本，制表符分隔的文件格式。我需要读取字符串和数值。问题是该表在第三行之前不会启动，前两行包含版本信息和有关数据大小的信息。当我尝试使用加载和导入数据等常规方法时，我最终会收到一条错误消息，指出第2行中的列与第1行中的列不匹配。

我已经写了一些代码来逐行读取文件，我将附上。我需要弄清楚如何制作一个包含4个字段的结构

这是我的代码

sed -i '/(hello*)/ s/$/ 0/' hello.txt
perl -ipe 's/$/ 0/ if /hello/' hello.txt
sed -i '/^hello*/ s/$/ 0/' hello.txt

非常感谢任何帮助，提前谢谢！

Answer 1

您可以使用textscan来解析此文件格式。使用文件格式，我们可以读取预期的行数和列数。然后我们可以读取标题并将它们放入单元格数组中。然后我们可以为每个剩余的行创建自定义格式规范，并读入文件的其余部分。完成后，我们可以将标头与数据结合起来构建一个struct，其字段与标题匹配。

此解决方案非常灵活，因为它实际上解析文件格式本身以确定列数而不是硬编码特定值。

fid = fopen('filename.txt', 'r');

% Skip the first line and determine the number or rows and number of samples
dims = textscan(fid, '%d', 2, 'HeaderLines', 1);
ncols = dims{1}(2);

% Now read the variable names
varnames = textscan(fid, '%s', 2 + ncols);
varnames = varnames{1};

% Now create the format spec for your data (2 strings and the rest floats)
spec = ['%s%s', repmat('%f', [1 ncols])];

% Read in all of the data using this custom format specifier. The delimiter will be a tab
data = textscan(fid, spec, 'Delimiter', '\t');

% Place the data into a struct where the variable names are the fieldnames
inputs = cat(1, varnames(:)', data);
S = struct(inputs{:});

%   7x1 struct array with fields:
%
%   Name
%   Desc
%   A2
%   B2
%   C2
%   D2
%   E2
%   F2
%   G2
%   H2

Answer 2

使用以下方法

fileID = fopen('gene_expr_500x204.gct','r+');
C = textscan(fileID,'%s%s%s%s') %assuming you have 4 columns

您可以使用

分隔数据

numericaldata = str2double(C{1}(3:end))
string data = C{1}(1:2)

假设列数未知，请使用

 delimiter = '\t'
 fid = fopen('testtext3.txt','rt');
 tLines = fgets(fid);
 numCols = numel(strfind(tLines,delimiter)) + 1;
 formatSpec = repmat(['%s'],1,numCols )

或者只知道列数

KnownColumns = 206;
formatSpec = repmat(['%s'],1,KnownColumns)

<强>更新关于你的第二个问题，实际上你可以将任何数据类型存储到结构字段，我已经给出了下面的方法 a = {[1 2 3]，'CO'}

a = 

    [1x3 double]    'CO'

b = table([1 2 3].','VariableNames',{'Heading'})

b = 

    Heading
    _______

    1      
    2      
    3      

 c = [1 2 3;4 5 6]

c =

     1     2     3
     4     5     6

Struc(1).DataTypes = a

Struc(2).DataTypes = b

Struc(3).DataTypes = c

struct2table(Struc)

ans = 

     DataTypes  
    ____________

    {1x2 cell  }
    [3x1 table ]
    [2x3 double]

我正在尝试读取制表符分隔的文本文件，并将数据的某些部分存储在matlab结构的不同字段中

2 个答案: