我创建了一个正则表达式来查找匹配如下的字符串。
window.location="process.php?file=Generics/index.nsp";
location.href = "http://www.foo.com";
location.href="http://www.foo.com";
window.location = "http://www.foo.com";
window.location.href="http://www.foo.com";
location.replace ("http://www.foo.com");
location.replace( "http://www.foo.com" ) ;
location.assign ("http://www.foo.com");
window.location= var +"process.php?file=Generics/index.nsp";
window.location.href = path + "login" + ".html";
window.location.href = path + featureId + ".html";
window.location.href = "/" + "online-banking" + ".html";
window.location.href = path + featureName +".html";
window.location.href = link.page;
window.location.href = path + link.page + ".html";'
我检查了这个表达式在正则表达式测试页面中是否正常工作。
但是,当我执行程序时,我无法获得正确的结果。
# -*- coding: utf-8 -*-
import re
url_reg= re.compile('(location\.(href|assign|replace)|window\.location(\.href|))\s*(=|\()+.*(;|$)')
test = '123123gjh123\
2135115123\
window.location="process.php?file=Generics/index.nsp";\
location.href = "http://www.foo.com";\
location.href="http://www.foo.com";\
window.location = "http://www.foo.com";\
window.location.href="http://www.foo.com";\
location.replace ("http://www.foo.com");\
location.replace( "http://www.foo.com" ) ; \
location.assign ("http://www.foo.com");\
window.location= var +"process.php?file=Generics/index.nsp";\
window.location.href = path + "login" + ".html";\
window.location.href = path + featureId + ".html";\
window.location.href = "/" + "online-banking" + ".html";\
window.location.href = path + featureName +".html";\
window.location.href = link.page;\
window.location.href = path + link.page + ".html";'
print url_reg.findall(test)
结果:
[('window.location', '', '', '=', '')]
我想得到如下结果值。
['window.location="process.php?file=Generics/index.nsp"','location.href = "http://www.foo.com"','window.location = "http://www.foo.com"','window.location.href="http://www.foo.com"',..]
请给我一些建议。
答案 0 :(得分:1)
这是你的目的吗?我刚刚添加了r''
和外括号。
import re
url_reg= re.compile(r'((location\.(href|assign|replace)|window\.location(\.href)?)\s*(=|\()+.*(;|$))')
test = 'blahblah...'
print map(lambda x: x[0], url_reg.findall(test))
['window.location="process.php?file=Generics/index.nsp";location.href = "http://www.foo.com";location.href="http://www.foo.com";window.location = "http://www.foo.com";window.location.href="http://www.foo.com";location.replace ("http://www.foo.com");location.replace( "http://www.foo.com" ) ; location.assign ("http://www.foo.com");window.location= var +"process.php?file=Generics/index.nsp";window.location.href = path + "login" + ".html";window.location.href = path + featureId + ".html";window.location.href = "/" + "online-banking" + ".html";window.location.href = path + featureName +".html";window.location.href = link.page;window.location.href = path + link.page + ".html";']