Question

我目前正在编写一个Matlab代码来绘制测量数据。不幸的是，串行通信存在硬件问题，有时我只会收到乱码。我的代码仅适用于已定义的数据，因此必须删除此乱码。我想要这样的伪代码：

for eachLine
   if currentLineContainsNonASCII
      delete completeLine
   end if
end for

数据读取如下

rawdataInputFilename = 'measurementData.txt';  
fileID = fopen(rawdataInputFilename);

% load data as string 
DataCell = textscan(fileID,'%s %s %s %s %s %s %s %s %s %s %s %s %s %s %s','HeaderLines', 1);

我在考虑首先创造一个新的＆＃39;清洁＆＃39;文件只有ASCII字符，然后用我的实际绘图代码读取该文件。

我遇到的问题是如何识别非ASCII然后删除整行，而不仅仅是覆盖单个字符。

一些示例数据，1。和3. line是＆＃39; clean＆＃39;并且可以使用当前代码处理。第二行中包含非ASCII，因此会终止我的代码。空格字符是窗口换行符，制表符和空格。

61 380 Module03 Slot02 27.01.2015 13:47:13  450 3587 1175 84    101.83 22.30 5.20 1  1
62 386 Module03 Slot03 27.01.2015 13:47:18  450ÆăǳШШ    106.83 22.30 25.20 1    1 
63 391 Module03 Slot04 27.01.2015 13:47:24  ERROR dgsf 5643332  103.26 22.40 25.20 1 1

Answer 1

您可以检查收到的字符是否在[32,127]范围内，否则跳过它。

以下函数将告诉您给定字符串中是否存在任何不可打印的字符：

function R = has_non_printable_characters(str)
    % Remove non-printable characters
    str2 = str(31<str & str<127); 
    % check if length of resulting string is the same than input string
    R = (lenght(str) > length(str2))
end;

如果要删除整个字符串而不是仅删除不可打印字符而保留可打印字符，请修改该函数并返回str2。（并更改函数名称以使其与新行为匹配）

Answer 2

有几种方法可以做到。

将其保存到名为data.txt的文本文件中：

bla Header bla
61 380 Module03 Slot02 27.01.2015 13:47:13  450 3587 1175 84    101.83 22.30 5.20 1  1
62 386 Module03 Slot03 27.01.2015 13:47:18  450ÆăǳШШ    106.83 22.30 25.20 1    1 
63 391 Module03 Slot04 27.01.2015 13:47:24  ERROR dgsf 5643332  103.26 22.40 25.20 1 1

方法1（使用textscan和cellfun）：

完全删除非ASCII行：

fileID = fopen('data.txt');    % open file
DataCell = textscan(fileID,'%s','delimiter','','HeaderLines', 1);    % read a complete line of text, ignore the first line
fclose(fileID);    % close file
DataCell = DataCell{1};    % there is only one string per line
DataCell(cellfun(@(x) any(x>127),DataCell)) = [];    % remove line if there is any non-ASCII in it, adjust that to your liking, i.e (x>126 | x<32)

celldisp(DataCell)

DataCell {1} =

61 380 Module03 Slot02 27.01.2015 13:47:13 450 3587 1175 84 101.83 22.30 5.20 1 1

DataCell {2} =

63 391 Module03 Slot04 27.01.2015 13:47:24 ERROR dgsf 5643332 103.26 22.40 25.20 1 1

您现在可以循环遍历单元格数组，或者，如果您愿意，可以使用更新的文本重新开始（f.e。作为textscan的输入）。要做到这一点，将单元格连接到一大块文本：

strjoin(DataCell','\n')

ans =

61 380 Module03 Slot02 27.01.2015 13:47:13 450 3587 1175 84 101.83 22.30 5.20 1 1
  63 391 Module03 Slot04 27.01.2015 13:47:24 ERROR dgsf 5643332 103.26 22.40 25.20 1 1

方法2（使用regexprep）：

我正在加载整个文本文件，并用空字符串''替换任何行，该字符串不包含给定的字符集。

s = fileread('data.txt');
snew = regexprep(s, '.*[^\w\s.:].*\n', '', 'dotexceptnewline')

snew =

61 380 Module03 Slot02 27.01.2015 13:47:13 450 3587 1175 84 101.83 22.30 5.20 1 1

63 391 Module03 Slot04 27.01.2015 13:47:24 ERROR dgsf 5643332 103.26 22.40 25.20 1 1

[^\w\s.:]位基本上转化为：
匹配任何不是（^表示不是）的字符：

字母，数字或下划线（\w）
空白（\s）
点.或
冒号:

如果要排除任何其他ASCII字符，只需将其添加（在括号内）。

Answer 3

这里是使用非ASCII

创建一个新的txt文件的代码

%% read in via GUI
[inputFilename, inputPathname] = uigetfile('*.txt', ...
    'Pick a .txt file from which you want to remove lines with non ASCII characters.');
if isequal(inputFilename, 0)
    disp('User selected ''Cancel''')
else
    disp(['User selected ', fullfile(inputPathname, inputFilename)])
    inputFileID = fopen(fullfile(inputPathname, inputFilename)); %open/load file
end

tempCell = (strsplit(inputFilename,'.'));
inputFilenameWOextension = cell2mat(tempCell(1));
fileExtension = cell2mat(tempCell(2));

outputFileID = fopen([inputFilenameWOextension, '_ASCIIonly.', fileExtension], 'w'); %overwrite existing file

% get a single line of text
tline = fgetl(inputFileID);
while tline ~= -1
    % get a single line of text
    tline = fgetl(inputFileID);

    % Remove non-printable characters
    tempStr = tline(tline<127); % not really ASCII, but also tab
    %tempStr = tline(31<tline & tline<127); % true ASCII
    if (length(tempStr) < length(tline));
       continue; 
    else
        fprintf(outputFileID, '%s\r\n', tempStr);
    end

end
fclose(inputFileID);
fclose(outputFileID);

Matlab：如果存在非ascii字符，则删除txt文件中的完整行

3 个答案:

方法1（使用textscan和cellfun）：

方法2（使用regexprep）：