Question

全新的程序员尽管尝试了各种在线正则表达式测试人员，但仍然遇到正则表达式问题。我在Eclipse项目的Eclipse项目中工作我正在查询一个openx广告服务器以获取文字广告并获得此回报：

var OX_abced445 = '';
OX_abced445 += "<"+"a href=\'http://the.server.url/openx/www/delivery/ck.php?oaparams=2__bannerid=29__zoneid=3__cb=e3efa8b703__oadest=http%3A%2F%2Fsomesite.com\'target=\'_blank\'>This is some sample text to test with!<"+"/a><"+"div id=\'beacon_e3efa8b703\'style=\'position: absolute; left: 0px; top: 0px; visibility:hidden;\'><"+"img src=\'http://the.server.url/openx/www/delivery/lg.php?bannerid=29&amp;campaignid=23&amp;zoneid=3&amp;loc=1&amp;cb=e3efa8b703\' width=\'0\'height=\'0\' alt=\'\' style=\'width: 0px; height: 0px;\' /><"+"/div>\n";
document.write(OX_abced445);

我需要提取第一个href网址但不提取img src网址，所以我想我应该有一个正则表达式，用于查找href=\'和'之间的所有内容。我还需要提取目标文本，即。封装在This is some sample text to test with!和_blank\'>之间的<"+"/a>。我发现很多正则表达式都在处理提取URL等问题，但是在这个特殊情况下，我一直在努力让一个人在Eclipse中工作。任何帮助将不胜感激。

Answer 1

尝试解析使用正则表达式生成HTML的JavaScript是a very bad idea。使用类似JSoup或Validator.nu的Java或Nokogiri代替Ruby。如果你必须使用正则表达式：

Plain regex:
^.*? href=\\'([^']+)\'[^>]*>([^<]*)<

or, in Java:

Pattern p = Pattern.compile("^.*? href=\\\\'([^']+)\\'[^>]*>([^<]*)<", 
                            Pattern.MULTILINE);
Matcher m = p.matcher(hideousString);
m.find();
// Now m.group(1) is the URL and m.group(2) is the text

将捕获捕获组1中的href网址和捕获组2中的文本，但如果网站更改其响应格式，则会快速中断。

正则表达式匹配一个网址而不是另一个网址

1 个答案: