MATLAB - 用索引替换文本文件中的字符串,并删除那些没有索引的行

时间:2017-06-07 03:49:10

标签: matlab

我有一个像这样的数据结构的文本文件

30,311.263671875,158.188034058,20.6887207031,17.4877929688,0.000297248129755,aeroplane
30,350.668334961,177.547393799,19.1939697266,18.3677368164,0.00026999923648,aeroplane
30,367.98135376,192.697219849,16.7747192383,23.0987548828,0.000186387239864,aeroplane
30,173.569274902,151.629364014,38.0069885254,37.5704650879,0.000172595537151,aeroplane
30,553.904602051,309.903320312,660.893981934,393.194030762,5.19620243722e-05,aeroplane
30,294.739196777,156.249740601,16.3522338867,19.8487548828,1.7795707663e-05,aeroplane
30,34.1946258545,63.4127349854,475.104492188,318.754821777,6.71026540999e-06,aeroplane
30,748.506652832,0.350944519043,59.9415283203,28.3256549835,3.52978979379e-06,aeroplane
30,498.747009277,14.3766479492,717.006652832,324.668731689,1.61551643174e-06,aeroplane
30,81.6389465332,498.784301758,430.23046875,210.294677734,4.16855394647e-07,aeroplane
30,251.932098389,216.641052246,19.8385009766,20.7131652832,3.52147743106,bicycle
30,237.536972046,226.656692505,24.0902862549,15.7586669922,1.8601918593,bicycle
30,529.673400879,322.511322021,25.1921386719,21.6920166016,0.751171214506,bicycle
30,255.900146484,196.583847046,17.1589355469,27.4430847168,0.268321367912,bicycle
30,177.663650513,114.458488464,18.7516174316,16.6759414673,0.233057001606,bicycle
30,436.679382324,273.383331299,17.4342041016,19.6081542969,0.128449092153,bicycle

我想用标签文件索引这些文件。结果将是这样的。

60,509.277435303,284.482452393,26.1684875488,31.7470092773,0.00807665128377,15
60,187.909835815,170.448471069,40.0388793945,58.8763122559,0.00763951029512,15
60,254.447280884,175.946624756,18.7212677002,21.9440612793,0.00442053096776,15

但是可能有一些类不在标签类中,我需要过滤掉那些行,所以我可以使用load()加载。(你不能在该文本文件中使用char并执行load()

这是我的工具:

function test(vName,meta)
f_dt   = fopen([vName '.txt'],'r');
f_indexed = fopen([vName '_indexed.txt'], 'w');
lbls   = loadlbl()

count = 1;
while(true),
  if(f_dt == -1),
    break;
  end   
  dt = fgets(f_dt);


  if(dt == -1),
    break
  else
    dt_cls = strsplit(dt,','){7};
    dt_cls = regexprep(dt_cls, '\s+', '');

    cls_idx = find(strcmp(lbls,dt_cls));
    if(~isempty(cls_idx))
      dt = strrep(dt,dt_cls,int2str(cls_idx));
      fprintf(f_indexed,dt);
    end
  end

end

fclose(f_indexed);
if(f_dt ~= -1),
fclose(f_dt);
end

end

然而它的工作速度非常慢,因为文本文件包含10万行。无论如何,我能更聪明,更快地完成这项任务吗?

1 个答案:

答案 0 :(得分:1)

您可以使用 textscan ,并获取所需标签的索引/行号。知道行号后,您可以提取所需内容。

fid = fopen('data.txt') ;
S = textscan(fid,'%s','delimiter','\n') ;
S = S{1} ;
fclose(fid) ;
%% get bicycle lines 
idx = strfind(S, 'bicycle');
idx = find(not(cellfun('isempty', idx)));
S_bicycle = S(idx)
%% write this to text file
fid2 = fopen('text.txt','wt') ;
fprintf(fid2,'%s\n',S_bicycle{:});
fclose(fid2) ;

从S_bicycle,您可以提取您的号码。