Question

我有一个来自科学数据“堆栈”值的格式奇怪的CSV文件，而不是附加一列。例如：

Parameter : Collision Energy (CE),
Mass =  174,
Max XY = 43,9242

Raw Data : 

0,260
1,268
2,291
3,327
4,366
5,405
Mass =  195,
Max XY = 38,11302

Raw Data : 

0,478
1,498
2,560
3,620
4,707
5,777
Mass =  236,
Max XY = 32,1447

Raw Data : 

0,96
1,100
2,108
3,115
4,122
5,129

相反，我想编写一个可以重新组织数据的函数：

    Mass =  174      Mass =  195       Mass =  236
    Max XY = 43,9242 Max XY = 38,11302 Max XY = 38,11302
0   260              478               96
1   268              498               100
2   291              560               108
3   327              620               115
4   366              707               122
5   405              777               129

我还没有走得太远，但到目前为止，我已将该文件作为表读取并将表转换为单元格。我想在单元格上使用逻辑索引来搜索字符串'Mass'，并将数据存储在两个索引之间的新列中，但它不起作用。

我的另一个想法是逐行读取字符串'Mass'。如果是真的 - >存储相应的列2.重复直到文件结束。

我能如何优雅地做到这一点很容易让我阅读（只是寻找想法，而不是期待完整的代码）？

Answer 1

您可以使用regular expressions来阅读各种数据类型。首先使用fileread函数将csv文件读入字符串：

filetext = fileread('data.csv');

然后我们可以查找单个字段。例如，Mass的数字可以提取为：

% read the matching tokens into a cell array
c_mass = regexp(filetext, 'Mass =[\ ]*([\d]+)', 'tokens'); 

% convert the cell array to characters and then interpret as numbers
v_mass = str2num(char([c_mass{:}]));
% [174   195   236]

同样，对于Max XY：

c_max = regexp(filetext, 'Max XY =[\ ]*([\d]+,[\d]+)', 'tokens'); 
v_max = str2num(char([c_max{:}]));
% [43        9242;          38       11302;          32        1447]

最后是原始数据：

c_raw = regexp(filetext, '\n([\d]+,[\d]+)', 'tokens'); 
v_raw = str2num(char([c_raw{:}]));

请注意，这会将整个原始数据读入两列。但你可以轻松地reshape分隔各种块。

重新格式化将连续CSV拆分为多个列

1 个答案: