Question

我试图从长xml文件中提取所有url字符串，我需要的url是loc元素之间，我想丢弃其他所有内容。

skipped_rows

例如，我会这样做：

  <loc>https://www.website.com/1</loc>

然后回复：

  <url>
   <loc>https://www.website.com/1</loc>
   <lastmod>2017-04-01T08:18:42+00:00</lastmod>
   <changefreq>daily</changefreq>
   <priority>1.0000</priority>
  </url>

  <url>
   <loc>https://www.website.com/2</loc>
   <lastmod>2017-04-01T08:18:42+00:00</lastmod>
   <changefreq>daily</changefreq>
   <priority>1.0000</priority>
  </url>

  <url>
   <loc>https://www.website.com/3</loc>
   <lastmod>2017-04-01T08:18:42+00:00</lastmod>
   <changefreq>daily</changefreq>
   <priority>1.0000</priority>
  </url>

有什么想法吗？提前谢谢

Answer 1

var regex = /https.*(?=<\/loc>)/gm;
str.match(regex)

这将返回一个包含所有匹配项的数组

搜索/替换元素之间的提取字符串

1 个答案: