如何使用增量.htm文件以正确的文件格式和路径读取我的文件?
path:DATA\WEBPAGE_SOURCE\train75_phish_data\1.htm
file:1.htm,2.htm,3.htm....etc
内部1.htm,2.htm,3.htm ....等是网页的官方代码
我尝试使用以下示例,但在i = 21时遇到错误。
data2=fopen(strcat('DATA\WEBPAGE_SOURCE\train75_phish_data\',int2str(i),'.htm'),'r')
我已经提到过这个,还是不行,有什么想法吗?
http://www.mathworks.com/help/matlab/ref/fopen.html
这是我的代码:
data = importdata('DATA/URL/trainURL')
domain_URL = regexp(data,'\w*://[^/]*','match','once')
[sizeData b] = size(domain_URL);
for i = 1:150
A7_data = domain_URL{i};
data2=fopen(strcat('DATA\WEBPAGE_SOURCE\train75_phish_data\',int2str(i),'.htm'),'r')
CharData = fread(data2, '*char')'; %read text file and store data in CharData
img_only = regexp(CharData, '<img.*?>', 'match');
feature7_data=(cellfun(@(n) isempty(n), strfind(img_only, A7_data)))
B7(i)=sum(feature7_data)
end
feature7(B7>=10)=1;
feature7(B7<10&B7>5)=0;
feature7(B7<=5)=-1;
feature7'
这是我的输出:
data = importdata('DATA/URL/trainURL') is a list of URL being saved inside
我无法循环i = 20的结果,当迭代= 21时它会出错,我想循环到150,它cnt读取&#39; data2&#39;对于&#39; i = 21&#39;
答案 0 :(得分:0)
我认为您需要处理可能以更原则的方式出现的可能异常。试试这个:
data = importdata('DATA/URL/trainURL')
domain_URL = regexp(data,'\w*://[^/]*','match','once')
[sizeData b] = size(domain_URL);
for i = 1:150
A7_data = domain_URL{i};
filename = fullfile('DATA\WEBPAGE_SOURCE\train75_phish_data\',strcat(int2str(i),'.htm'));
if (exist(filename,'file')),
disp(sprintf('file %s exists, processing it',filename));
data2=fopen(filename,'r');
CharData = fread(data2, '*char')'; %read text file and store data in CharData
fclose(data2);
img_only = regexp(CharData, '<img.*?>', 'match');
feature7_data=(cellfun(@(n) isempty(n), strfind(img_only, A7_data)))
B7(i)=sum(feature7_data)
else,
disp(sprintf('file %s does not exist, skipping it!',filename));
end
end
feature7(B7>=10)=1;
feature7(B7<10&B7>5)=0;
feature7(B7<=5)=-1;
feature7'
之后的那条线。