如何检查URL中是否存在声明标识

时间:2014-03-18 01:47:47

标签: url matlab matlab-figure

此功能的目的是检查是否有2个或3个... URL隐藏在1个URL中,如果是,则返回1,否则返回0,例如www.applee.com/www.samsunge.com,http://www.samsungds.http://comwww.samsung.com

我已解决了importdata问题,但现在我在查看下面的数据方面遇到了困难:(我已经修改了' is_double_url.m'文件,但它返回了我的错误)

http://encuestanavemotors.com.ar/doc/newgoogledoc2013/2013gdocs/ http://totalwhiteboard.com.au/.pp/0053d4ae3e2c78154d29d413c1236341/192.186.237.145/H/ http://www.wwwwwwwwwws2.com/ http://www.paypal.com.cy.cgi.bin.webscr.cmd.login.submit.dispatch.5885d80a1faee8d48a116ba977951b3435308b8c4.turningpoint.in/f044c94b4394939f4a1a75798875f78c/ http://www.celebramania.cl/web/cc/personal/cards/5d0d5c5af4f12c319d47872fabe11262/Pool=0/?cmd=_home&dispatch=5885d80a13c0db1f8e&ee=5cd428ee24c5037dda298a4762735a94 http://joannalindsay.com/wp-content/uploads/aloo/aaleor.php?bidderblocklogin&hc=1&hm=uk%601d72f%2Bj2b2vi%3C265bidderblocklogin&hc=1&hm=uk%601d72f%2Bj2b2vi%3C265bidderblocklogin&hc=1&hm=uk%601d72f%2Bj2b2vi%3C265 http://bluedominoes.com/~kosalbco/paypal.de/

is_double_url.m文件

function out = is_double_url(url_path1)

f1 = strfind(url_path1,'www.');
if isempty(f1)
out = 0;
return;
end
f2 = strfind(url_path1,'/');
f3 = bsxfun(@minus,f2,f1');

count_dots = zeros(size(f3,1),1);
for k = 1:size(f3,1)
[x,y] = find(f3(k,:)>0,1);
str2 = url_path1(f1(k):f2(y));
if ~isempty(strfind(str2,'..'))
    continue
end
count_dots(k) = nnz(strfind(str2,'.'));
end
out = ~any(count_dots(2:end)<2);

if any(strfind(url_path1,'://')>f2(1))
out = true;
end

return;

f10.m文件     data = importdata(&#39; url&#39;);     [sizeData b] = size(data);

for i = 1:sizeData
feature10(i) = is_double_url(data{i});

end

1 个答案:

答案 0 :(得分:1)

<强>代码

function out = is_double_url(url_path1)

if url_path1(end)~='/'
    url_path1(end+1)='/';
end

url_path1 = regexprep(url_path1,'//','//www.');
url_path1 = regexprep(url_path1,'//www.www.','//www.');

f1 = strfind(url_path1,'www.');
if numel(f1)<2
    out = false;
else
    f2 = strfind(url_path1,'/');
    f3 = bsxfun(@minus,f2,f1');

    count_dots = zeros(size(f3,1),1);
    for k = 1:size(f3,1)
        [~,y] = find(f3(k,:)>0,1);
        str2 = url_path1(f1(k):f2(y));
        if ~isempty(strfind(str2,'..'))
            continue
        end
        count_dots(k) = nnz(strfind(str2,'.'));
    end
    out = ~any(count_dots(2:end)<2);

    if any(strfind(url_path1,'://')>f2(1))
        out = true;
    end
end

return;

<强>运行

is_double_url('http://www.farthingalescorsetmakingsupplies.com/files/files/www.apple.com/')

is_double_url('http://www.farthingalescorsetmakingsupplies.com/files/files/www.com/')

is_double_url('http://www.farthingalescorsetmakingsupplies.com/files/files/https://www.com/')

is_double_url('http://www.farthingalescorsetmakingsupplies.com/files/files/https://www.dfdsf.my/')


Returns - 1 0 1 1 respectively.

如果您在文本文件中有URL列表,请使用此列表检查每个URL -

fid = fopen('text2.txt'); %% 'text2.txt' has the urls on line by line basis
C = textscan(fid, '%s\n');
fclose(fid);

for k = 1:numel(C{1})
    out(k) = is_double_url(C{1}{k}); %%// out stores the condition checked statuses
end