Question

我正在进行一项任务，我必须阅读制表符分隔的文本文件，我的输出必须是matlab结构。

文件的内容看起来像这样（有点乱，但你得到的图片）。实际文件包含500个基因（从Analyte 1开始的行）和204个样本（从A2开始的列）

#1.2                                    
500 204                             
Name        Desc        A2  B2  C2  D2  E2  F2  G2  H2
Analyte 1   Analyte 1   978 903 1060    786 736 649 657 733.5
Analyte 2   Analyte 2   995 921 995.5   840 864.5   757 739 852
Analyte 3   Analyte 3   1445.5  1556.5  1579    1147.5  1249    1069.5  1048    1235
Analyte 4   Analyte 4   1550    1371    1449    1127    1196    1337    1167    1359
Analyte 5   Analyte 5   2074    1776    1960    1653    1544    1464    1338    1706
Analyte 6   Analyte 6   2667    2416.5  2601    2257    2258    2144    2173.5  2348
Analyte 7   Analyte 7   3381.5  3013.5  3353    3099.5  2763    2692    2774    2995

我的代码如下：

fid = fopen('gene_expr_500x204.gct', 'r');%Open the given file

% Skip the first line and determine the number or rows and number of samples
dims = textscan(fid, '%d', 2, 'HeaderLines', 1);
ncols = dims{1}(2);

% Now read the variable names
varnames = textscan(fid, '%s', 2 + ncols);
varnames = varnames{1};

% Now create the format spec for your data (2 strings and the rest floats)
spec = ['%s%s', repmat('%f', [1 ncols])];

% Read in all of the data using this custom format specifier. The delimiter     will be a tab
data = textscan(fid, spec, 'Delimiter', '\t');

% Place the data into a struct where the variable names are the fieldnames
ge = data{3:ncols+2}
S = struct('gn', data{1}, 'gd', data{2}, 'sid', {varnames});

关于ge的部分是我目前的尝试，但它并没有真正起作用。任何帮助将非常感谢，提前谢谢!!

Answer 1

struct字段可以包含任何数据类型，包括多维数组或矩阵。

您的问题是data{3:ncols+2}创建了comma-separated list。由于分配的左侧只有一个输出，ge只会保留最后列的值。您需要使用cat将所有列连接成一个大矩阵。

ge = cat(2, data{3:end});

% Or you can do this implicitly with []
% ge = [data{3:end}];

然后您可以将此值传递给struct构造函数

S = struct('gn', data(1), 'gd', data(2), 'sid', {varnames}, 'ge', ge);

结构域是否可能包含矩阵？

1 个答案: