此功能的目的是检查是否有2个或3个... URL隐藏在1个URL中,如果是,则返回1,否则返回0,例如www.applee.com/www.samsunge.com,http://www.samsungds.http://comwww.samsung.com
我已解决了importdata问题,但现在我在查看下面的数据方面遇到了困难:(我已经修改了' is_double_url.m'文件,但它返回了我的错误)
http://encuestanavemotors.com.ar/doc/newgoogledoc2013/2013gdocs/ http://totalwhiteboard.com.au/.pp/0053d4ae3e2c78154d29d413c1236341/192.186.237.145/H/ http://www.wwwwwwwwwws2.com/ http://www.paypal.com.cy.cgi.bin.webscr.cmd.login.submit.dispatch.5885d80a1faee8d48a116ba977951b3435308b8c4.turningpoint.in/f044c94b4394939f4a1a75798875f78c/ http://www.celebramania.cl/web/cc/personal/cards/5d0d5c5af4f12c319d47872fabe11262/Pool=0/?cmd=_home&dispatch=5885d80a13c0db1f8e&ee=5cd428ee24c5037dda298a4762735a94 http://joannalindsay.com/wp-content/uploads/aloo/aaleor.php?bidderblocklogin&hc=1&hm=uk%601d72f%2Bj2b2vi%3C265bidderblocklogin&hc=1&hm=uk%601d72f%2Bj2b2vi%3C265bidderblocklogin&hc=1&hm=uk%601d72f%2Bj2b2vi%3C265 http://bluedominoes.com/~kosalbco/paypal.de/
is_double_url.m文件
function out = is_double_url(url_path1)
f1 = strfind(url_path1,'www.');
if isempty(f1)
out = 0;
return;
end
f2 = strfind(url_path1,'/');
f3 = bsxfun(@minus,f2,f1');
count_dots = zeros(size(f3,1),1);
for k = 1:size(f3,1)
[x,y] = find(f3(k,:)>0,1);
str2 = url_path1(f1(k):f2(y));
if ~isempty(strfind(str2,'..'))
continue
end
count_dots(k) = nnz(strfind(str2,'.'));
end
out = ~any(count_dots(2:end)<2);
if any(strfind(url_path1,'://')>f2(1))
out = true;
end
return;
f10.m文件 data = importdata(&#39; url&#39;); [sizeData b] = size(data);
for i = 1:sizeData
feature10(i) = is_double_url(data{i});
end
答案 0 :(得分:1)
<强>代码强>
function out = is_double_url(url_path1)
if url_path1(end)~='/'
url_path1(end+1)='/';
end
url_path1 = regexprep(url_path1,'//','//www.');
url_path1 = regexprep(url_path1,'//www.www.','//www.');
f1 = strfind(url_path1,'www.');
if numel(f1)<2
out = false;
else
f2 = strfind(url_path1,'/');
f3 = bsxfun(@minus,f2,f1');
count_dots = zeros(size(f3,1),1);
for k = 1:size(f3,1)
[~,y] = find(f3(k,:)>0,1);
str2 = url_path1(f1(k):f2(y));
if ~isempty(strfind(str2,'..'))
continue
end
count_dots(k) = nnz(strfind(str2,'.'));
end
out = ~any(count_dots(2:end)<2);
if any(strfind(url_path1,'://')>f2(1))
out = true;
end
end
return;
<强>运行强>
is_double_url('http://www.farthingalescorsetmakingsupplies.com/files/files/www.apple.com/')
is_double_url('http://www.farthingalescorsetmakingsupplies.com/files/files/www.com/')
is_double_url('http://www.farthingalescorsetmakingsupplies.com/files/files/https://www.com/')
is_double_url('http://www.farthingalescorsetmakingsupplies.com/files/files/https://www.dfdsf.my/')
Returns - 1 0 1 1 respectively.
如果您在文本文件中有URL列表,请使用此列表检查每个URL -
fid = fopen('text2.txt'); %% 'text2.txt' has the urls on line by line basis
C = textscan(fid, '%s\n');
fclose(fid);
for k = 1:numel(C{1})
out(k) = is_double_url(C{1}{k}); %%// out stores the condition checked statuses
end