Question

我需要阅读混合了数值和字符的文本。这是一个例子：

x = fgetl(fid);

out = sscanf(x,'%% Loc  : LAT  = %f LON = %f DEP = %f\n');

我只需要阅读数字字段。

通常我用过这个：

out = sscanf(x,'%[a-zA-Z.=\t\b]s %f %[a-zA-Z.=\t\b]s %f %[a-zA-Z.=\t\b]s %f\n');

它可以工作，但问题是并非所有文件都有固定的格式，有时字母是以大写或小写的形式写的。在这种情况下，我所做的并不起作用。

我尝试使用

跳过所有字符

Product in Inches \(L x W x H\):\s*(\d+.?\d*) x (\d+.?\d*) x (\d+.?\d*)

但它不起作用！

请注意，文件行不一样，每行文件中的数字字段数不同。

Answer 1

我找到了一个可能的解决方案，即使它确实没有＆＃34;优雅＆＃34;，但似乎仍然有效。

它基于以下过程：

使用fgets
使用strtok
尝试使用str2num
如果它实际上是一个＆＃34;数字＆＃34; str2num（即如果str2num未返回空数组）在输出矩阵中插入数字

输出矩阵在脚本开头初始化（到NaN），足够大，可以：

大于或等于输入文件行数的行数（如果事先不知道，应该定义＆＃34;合理的＆＃34;值）
大于或等于输入文件的一行中可以出现的最大数值的多个列（如果事先不知道，应该定义＆＃34;合理的＆＃34;值）。

一旦您阅读完所有输入文件，您就可以清理＆＃34;通过删除超出的完整NaN行和列来输出矩阵。

在下文中，您可以找到脚本，我使用的输入文件和输出矩阵（查看它应该更清楚地将其初始化为NaN的原因 - 我希望如此）。

请注意，数字及其提取的标识（使用strtok）基于示例行的格式：特别是，例如，它基于以下事实：字符串由空格分隔。

这意味着代码无法将=123.456标识为数字。

如果您的输入文件包含=123.456等令牌，则必须修改代码。

% Initialize rows counter
r_cnt=0;
% INitialize column counter
c_cnt=0;
% Define the number of rows of the input file (if it not known in advance,
% put a "reasonable" value) - Used to initialize the output matrix
file_rows=5;
% Define the number of numeric values to be extracted from the input file
% (if it not known in advance, put a "reasonable" value) - Used to
% initialize the output matrix
max_col=5;
% Initialize the variable holding the maximum number of column. Used to
% "clean" the output matrix
max_n_col=-1;
% Initialize the output matrix
m=nan(file_rows,max_col);
% Open the input file
fp=fopen('char_and_num.txt','rt');
% Get the first row
tline = fgets(fp);
% Loop to read line by line the input file
while ischar(tline)
   % Increment the row counter
   r_cnt=r_cnt+1;
   % Parse the line looking for numeric values
   while(true)
      [str, tline] = strtok(tline);
      if(isempty(str))
         break
      end
      % Try to conver the string into a number
      tmp_val=str2num(str);
      if(~isempty(tmp_val))
         % If the token is a number, increment the column counter and
         % insert the number in the output matrix
         c_cnt=c_cnt+1;
         m(r_cnt,c_cnt)=tmp_val;
      end
   end
   % Identify the maximum number not NaN column in the in the output matrix
   % so far
   max_n_col=max(max_n_col,c_cnt);
   % Reset the column counter before nest iteration
   c_cnt=0;
   % Read next line of the input file
   tline = fgets(fp);
end
% After having read all the input file, close it
fclose(fp)
% Clean the output matrix removing the exceeding full NaN rows and columns
m(r_cnt+1:end,:)=[];
m(:,max_n_col+1:end)=[];
m

输入文件

% Loc  : LAT  = -19.6423        LON = -70.817                      DEP = 21.5451196625
% Loc  : xxx  = -1.234          yyy = -70.000   WIDTH = 333.369    DEP = 456.5451196625
% Loc  : zzz  = 1.23

<强>输出

m =

  -19.6423  -70.8170   21.5451       NaN
   -1.2340  -70.0000  333.3690  456.5451
    1.2300       NaN       NaN       NaN

希望这有帮助。

Answer 2

我对你的文件格式仍然有点不清楚，但似乎你可以使用textscan而不是更低级别的功能来更轻松地做到这一点。

这样的事情应该有效：

 while (~feof(fid))
      textscan(fid, '%s :'); % Read the part of the line through the colon
      data = textscan(fid, '%s = %f');
      % Do something with the data here
 end

变量fid是您在调用fopen时必须获得的文件标识符，并且在完成后您需要致电fclose。

我认为这不会完全解决你的问题，但希望它会让你走上一条更短更清洁的轨道。例如，你必须使用它来确保你实际上到达文件的末尾，并且没有角点情况会使模式匹配。

Answer 3

*scanf()使用类似"%d"的格式字符串，而不是'%d'之类的多字符常量详情："与'。

"%[]不使用尾随's'作为'%[a-zA-Z.=\t\b]s'中使用的OP

"%n"记录到目前为止扫描的int个字符数。

使用fscanf或sscanf时，跳过所有类型的字符matlab

3 个答案: