用于检测<a></a>标签内部文本的正则表达式

时间:2013-04-17 20:57:46

标签: javascript regex

我有这个正则表达式,在我的文本字符串中找到HTML锚标记之间的文本。该文本与推文相当,因此示例字符串为:

   http://google.com is great, but http://www.stackoverflow.com may be my only hope. www.yahoo.com is out of the question.

通过我的工作正则表达式:

function processTweetLinks(text) {
       console.log(text);
           text = text.replace();

   var replacedText, replacePattern1, replacePattern2;

   //URLs starting with http://, https://, or ftp://
   replacePattern1 = /(\b(https?|ftp):\/\/[-A-Z0-9+&@#\/%?=~_|!:,.;]*[-A-Z0-9+&@#\/%=~_|])/gim;
   text = text.replace(replacePattern1, '<a class="individualMessageBioWhateverLink" href="$1" target="_blank">$1</a>');

   //URLs starting with "www." (without // before it, or it'd re-link the ones done above).
   replacePattern2 = /(^|[^\/])(www\.[\S]+(\b|$))/gim;
   text = text.replace(replacePattern2, '$1<a class="individualMessageBioWhateverLink" href="http://$2" target="_blank">$2</a>');

   console.log(text);
   return text;
   }

出来:

  <a class="individualMessageBioWhateverLink" href="http://google.com" target="_blank">http://google.com</a> is great, but <a class="individualMessageBioWhateverLink" href="http://www.stackoverflow.com" target="_blank">http://www.stackoverflow.com</a> may be my only hope. <a class="individualMessageBioWhateverLink" href="http://www.yahoo.com" target="_blank">www.yahoo.com</a> is out of the question.

我想要另外一两行抓住<a></a>标签之间的内容,找到链接的开头(例如,http://https://http://www。,{{ 1}}),并删除它们,留下仍然可识别为用户链接的最短文本(“https://www.”)。我有一个似乎正确筛选文本的正则表达式,并在google.com标记之间找到这些内容,而不是在href中。这是表达式:

<a></a>

1 个答案:

答案 0 :(得分:0)

使用dom解析使其变得简单。如果要从此

中删除http://等,可以应用任何正则表达式
$('a').each(function(){
  alert(this.href);
});

请参阅http://jsfiddle.net/y6LUg/