如何检测3 in(> 3<)而不是3 in(rank_value_3_months)?
"<span data-bind-domain="rank_value_3_months">3</span>"
rank(i) = str2double(regexp(CharData7,'>(\d)<','match','once'))
以下是此部分的完整代码,我想检测pre-prosses文件后面的数字(&gt; number&lt;),
%function [feature7] = f7(data)
for i = 1:1
%start read html file
data2=fopen(strcat('DATA\WHOIS\TR\',int2str(i),'.htm'),'r')
CharData = fread(data2, '*char')'; %read text file and store data in CharData
fclose(data2);
%end read html file
register_date = regexp(CharData, '<span data-bind- domain="rank_value_3_months">.*?/span>', 'match'); %checking
%start write only http in image file
fid = fopen(strcat('DATA\PRE-PROCESS_DATA\F23_TR\f23_TR_pdata_',int2str(i)),'w');
for col = 1:numel(register_date)
fprintf(fid,'%s\n',register_date{:,col});
end
fclose(fid);
%end write only http in image file
s = dir(strcat('DATA\PRE-PROCESS_DATA\F23_TR\','f23_TR_pdata_', int2str(i)));
disp(s.bytes);
if s.bytes ~= 0
data7=fopen(strcat('DATA\PRE-PROCESS_DATA\F23_TR\f23_TR_pdata_',int2str(i),''),'r')
CharData7 = fread(data7, '*char')'; %read text file and store data in CharData
fclose(data7);
rank(i) = str2double(regexp(CharData7,'>(\d)<','tokens','once') )
else
end
if rank(i)~=0
feature23(i)=-1;
else
feature23(i)=1;
end
end
答案 0 :(得分:2)
假设CharData7
是一个单元格数组,您可以试试这个:
%// The find
%// - use 'tokens' to return just the part in brackets
%// - use \s* to make spacing flexible (which is also valid XML/HTML)
rank = regexp(CharData7, '>\s*(\d)\s*<', 'tokens', 'once');
%// Re-format into flat cells
%// ('tokens' returns ALL tokens, which is therefore a cell, regardless
%// of the 'once' setting)
rank = [rank{:}];
%// and convert everything to double
rank(i) = str2double(rank)
所以,在一个很好的难以理解的单行中:
rank(i) = str2double([builtin('_brace', regexp(C,'>\s*(\d)\s*<','tokens','once'), :)]);
如果CharData7
只是一个字符串,您可以跳过单元格展平步骤:
rank(i) = str2double( regexp(C,'>\s*(\d)\s*<','tokens','once') )