在文件中搜索字符串并将其放入数组中

时间:2013-07-09 21:45:19

标签: matlab

该程序的目标是查看文件中的字符串,并使用该字符串所在的行吐出该字符串的所有实例。

我已经找到它来搜索文件并找到它们,只是无法将它们放入数组或者让我存储所有文件的东西中。现在它给了我最后一个实例,我可以轻松地在第8行和第9行之间进行一次中断以找到第一个实例。

如果有人知道如何存储有问题字符串的每一行都会有很大的帮助。

fid = fopen('....... file directory....')
prompt = 'What string are you searching for?  ';
str = input(prompt,'s');

i=0;
for j=1:10000;
tline = fgetl(fid);             %Returns next line of specified file
counter = counter + 1;          %Counts the next line
    if ischar(tline);           %Checks if the line is an array
    U=strfind(tline,str);       %Sets U to be 1 if strfind finds the string in line tline
        if isfinite(U) == 1                     
            what = tline;       %This is where I want to set the array positions equal to whatever tline is at that time, then move onto the next i and search for the next tline.
            i=i+1;
        end            
    end
end

2 个答案:

答案 0 :(得分:1)

我建议如下:

haystack = 'test.txt';

prompt = 'What string are you searching for?  ';
needle = input(prompt, 's');

% MS Windows
if ispc

    output = regexp(evalc(['!find /N "' needle '" ' haystack]), char(10), 'split');

    output = regexp(output, '^\[([0-9].*)\]', 'tokens');
    output = cellfun(@(x)str2double(x{:}), output(~cellfun('isempty', output)))';

% OSX/Linux   
elseif isunix

    output = regexp(evalc(['!grep -n "' needle '" ' haystack]), char(10), 'split');

    output = regexp(output, '^([0-9].*):', 'tokens');    
    output = cellfun(@(x)str2double(x{:}), output(~cellfun('isempty', output)))';

% Anything else: stay in MATLAB
else

    fid = fopen(haystack);
    output = textscan(fid, '%s', 'delimiter', '\n');
    fclose(fid);

    output = find(~cellfun('isempty', regexp(output{1}, needle)));

end

test.txt的内容:

garbage garbage garbage 
garbage garbage garbage 
garbage garbage garbage 
garbage garbage garbage 
garbage garbage garbage 
garbage valuable garbage 
garbage garbage garbage 
garbage garbage valuable 
garbage garbage garbage 
garbage garbage garbage 
garbage garbage garbage 
garbage garbage garbage 
garbage valuable garbage 
garbage garbage garbage 
garbage garbage garbage 
garbage garbage garbage 

当我在Windows或Linux上执行代码或强制使用needle = 'valuable'的MATLAB版本时,我得到正确的行号:

output = 
    6   
    8  
   13

使用特定于操作系统的工具的优点是它们的内存占用量远远小于纯MATLAB版本(它们不会将整个文件加载到内存中)。即使您在MATLAB中增加代码以防止这种情况(例如通过使用带freadl的循环),特定于操作系统的工具仍然会更快(并且还有更多的记忆友好);这就是为什么我把它作为最后的手段:)

答案 1 :(得分:0)

您可以将它们存储在struct数组中,例如:

lines = struct([]); % to store lines and line numbers    

idx = 1;

fid = fopen('somefile.txt');

tline = fgets(fid);

while ischar(tline)

    U=strfind(tline, str);

    if numel(U) > 0                            

        lines(end + 1).line = tline; % save line
        lines(end).lineNo = idx;     % save its number 
        lines(end).U = U;            % where is str in the just saved line            

    end

    tline = fgets(fid);

    idx = idx + 1;

end

fclose(fid);

lineTxts    = {lines(:).line};   % get lines in a cell
lineNumbers = [lines(:).lineNo]; % get line numbers as matrix