fopen具有正确的文件格式和路径

时间:2014-04-15 11:53:30

标签: matlab matlab-figure matlab-deployment

如何使用增量.htm文件以正确的文件格式和路径读取我的文件?

path:DATA\WEBPAGE_SOURCE\train75_phish_data\1.htm
file:1.htm,2.htm,3.htm....etc

内部1.htm,2.htm,3.htm ....等是网页的官方代码

我尝试使用以下示例,但在i = 21时遇到错误。

data2=fopen(strcat('DATA\WEBPAGE_SOURCE\train75_phish_data\',int2str(i),'.htm'),'r')
我已经提到过这个,还是不行,有什么想法吗? http://www.mathworks.com/help/matlab/ref/fopen.html

这是我的代码:

data = importdata('DATA/URL/trainURL')
domain_URL = regexp(data,'\w*://[^/]*','match','once')

[sizeData b] = size(domain_URL);

for i = 1:150
A7_data = domain_URL{i};

data2=fopen(strcat('DATA\WEBPAGE_SOURCE\train75_phish_data\',int2str(i),'.htm'),'r')

CharData = fread(data2, '*char')';  %read text file and store data in CharData
img_only = regexp(CharData, '<img.*?>', 'match');

feature7_data=(cellfun(@(n) isempty(n), strfind(img_only, A7_data))) 
B7(i)=sum(feature7_data)


end

feature7(B7>=10)=1;
feature7(B7<10&B7>5)=0;
feature7(B7<=5)=-1;

feature7'

这是我的输出:

data = importdata('DATA/URL/trainURL') is a list of URL being saved inside

我无法循环i = 20的结果,当迭代= 21时它会出错,我想循环到150,它cnt读取&#39; data2&#39;对于&#39; i = 21&#39;

enter image description here

enter image description here

1 个答案:

答案 0 :(得分:0)

我认为您需要处理可能以更原则的方式出现的可能异常。试试这个:

data = importdata('DATA/URL/trainURL')
domain_URL = regexp(data,'\w*://[^/]*','match','once')

[sizeData b] = size(domain_URL);

for i = 1:150
   A7_data = domain_URL{i};    
   filename = fullfile('DATA\WEBPAGE_SOURCE\train75_phish_data\',strcat(int2str(i),'.htm'));
   if (exist(filename,'file')),
      disp(sprintf('file %s exists, processing it',filename));        
      data2=fopen(filename,'r');
      CharData = fread(data2, '*char')';  %read text file and store data in CharData    
      fclose(data2);
      img_only = regexp(CharData, '<img.*?>', 'match');    
      feature7_data=(cellfun(@(n) isempty(n), strfind(img_only, A7_data))) 
      B7(i)=sum(feature7_data)
   else,
      disp(sprintf('file %s does not exist, skipping it!',filename));        
   end
end

feature7(B7>=10)=1;
feature7(B7<10&B7>5)=0;
feature7(B7<=5)=-1;

feature7'

之后的那条线。