Question

我有一个Google脚本，可以从网址获取内容。我使用正则表达式来查找我需要抓取的内容，例如：

var htmlSubCategory = UrlFetchApp.fetch(url).getContentText();    
var regexpFindingAllLinks = /<div class="small-12 medium-5 large-4 columns"><a href="\/(.*?)\//g
var linksProducts = regexpFindingAllLinks.exec(htmlSubCategory);

我在编写其他正则表达式以查找某些项目的标题时遇到问题。源代码如下所示：

<p class="heading"><span class="highlight-ico"></span><a href="/url-1/" title="some title for URL 1">Title I need to grab</a></p>
<p class="heading"><span class="highlight-ico"></span><a href="/url-2/" title="some title for URL 2">Title I need to grab</a></p>

我基本上需要有一个寻找

的正则表达式

<p class="heading"><span class="highlight-ico"></span><a href="(can be any content)" title="(can be any content)">(grab this content)</a></p>

其次，我想要一个只能抓取参考数字的正则表达式，如下所示：X12345678，其中X是一个字母，后跟8位数字。

我是这些剧本的新手，我们将不胜感激。

Answer 1

你shouldn't use regex to parse HTML，但如果你不能以其他方式做到这一点，请使用：

/<p class="heading"><span class="highlight-ico"><\/span><a href="[^"]*" title="[^"]*">((?:(?!<\/a>).)*)<\/a><\/p>/

关于第二个问题（匹配参考号），请使用：

/X\d{8}/

Google Script的正则表达式 - 获取HTML

1 个答案: