Question

首先，在你说什么之前，我必须这样做，因为RSS格式不正确，但我不能纠正它。因此，当我尝试使用RSS和XML解析器时，它们会失败并且我只有前端访问权限。但是，我非常接近，但我无法弄清楚为什么这不匹配。

Feed（长 1行字符串）： http://pastebin.com/5dJhXCvf

第一个例子：

<title>(.+)</title>

我认为这对我的测试非常有用：

<title>&quot;cterrorism task force&quot; location:oregon - Google News</title>

但问题是它匹配所有内容然后作为一个匹配例如：

<title>&quot;cterrorism task force&quot; location:oregon - Google News</title><title>&quot;cterrorism task force&quot; location:oregon - Google News</title>

从exec()和match()

等于我的数组中的1个结果项

所以我试过了：

<title>([\w\d\s\=\%\_\`\~\+\!\@\#\$\%\^\&\*\(\)\:\'\"\[\]\{\}\|\,\.\/]+)</title>

但这没有任何回报......任何想法？

Answer 1

尝试非贪婪版本<title>(.+?)<\/title>。 Here您可以在线测试这些内容。

Answer 2

您发布的RSS是格式良好的XML，但不是有效的RSS（根据W3C Feed验证程序）。由于它的格式良好，您最好的选择仍然是使用XML解析器，而不是使用正则表达式。事实上，大多数RSS解析器也应该没问题，因为RSS因验证问题而臭名昭着（部分原因是由于早期的规格较差），因此任何值得使用的RSS解析器都不应该遇到任何类型的验证问题。 W3C验证器正在报告。

顺便说一句，这看起来像Google新闻Feed。您可以通过将输出参数从“rss”更改为“atom”来获取有效的Atom。例如：

http://news.google.com/news?pz=1&cf=all&ned=us&hl=en&topic=h&num=3&output=atom

Google生成Feed的服务通常可以更好地生成Atom而不是RSS。也就是说，您可能还想向Google报告无效的RSS。

Answer 3

尝试一个懒惰的量词：

<title>([^<]+?)</title>

Answer 4

通过添加U标志尝试一个不合适的表达式：

"/<title>(.+)</title>/U"

这告诉它匹配最小的匹配而不是可用的最大匹配。

Answer 5

许多解析器可以处理与规范的轻微偏差。对优秀libxml2库的任何绑定都能够处理格式不佳的XML。有许多语言的绑定。例如，以下Ruby代码段解析它就好了：

require 'nokogiri'

xml = open('rss.txt').read
doc = Nokogiri::XML.parse(xml)
doc.xpath('//title').each do |title|
  puts title.inner_text
end

结果：

"joint terrorism task force" location:oregon - Google News
"joint terrorism task force" location:oregon - Google News
Federal and FBI Joint Terrorism Task Force are still flawed - OregonLive.com
Striking a fair balance - OregonLive.com
Blame the terrorists, not the FBI - Portland Tribune
Why Oregon? Why not?: Terrorism can strike anywhere - The Register-Guard
INDIVIDUAL TRAVEL UNDER ATTACK - NewsWithViews.com
The other terrorism-and pondering Portland - BlueOregon
Fla. dance troupe causes scare at Lincoln Tunnel - Northwest Cable News

编辑：根据你的评论，我看到你正在使用jQuery。您应该能够使用jQuery XML解析器来提取标题（以及其他部分，如果需要）。

使用正则表达式我如何多次匹配XML标记？

5 个答案: