Question

我无法弄清楚如何（轻松）避免链接（2）替换链接（1）的开头。我很欣赏Ruby中的答案，但是如果你弄清楚它的逻辑也很好。

输出应为：

 message = "For Last Minute rentals, please go to:
    <span class='external_link' href-web='http://www.mydomain.com/thepage'>http://www.mydomain.com/thepage</span> (1)

    For more information about our events, please visit our website: 
    <span class='external_link' href-web='http://www.mydomain.com'>http://www.mydomain.com</span> (2)"

但它是：

    message = "For Last Minute rentals, please go to:
    <span class='external_link' href-web='<span class='external_link' href-web='http://www.mydomain.com'>http://www.mydomain.com</span>/thepage'><span class='external_link' href-web='http://www.mydomain.com'>http://www.mydomain.com</span>/thepage</span> (1)

    For more information about our events, please visit our website: 
    <span class='external_link' href-web='http://www.mydomain.com'>http://www.mydomain.com</span> (2)"

以下是代码（编辑：取消范围）：

     message = "For Last Minute rentals, please go to:
    http://www.mydomain.com/thepage

    For more information about our events, please visit our website: 
    http://www.mydomain.com"

   links_found = URI.extract(message, ['http', 'https'])

   for link_found in links_found          
     message.gsub!(link_found,"<span class='external_link' href-web='#{web_link}'>#{link_found}</span>")
   end

思想？

Answer 1

我猜你的问题与URI.extract有关。当它经过message时，它会拉所有“http”的实例，对于第一行，它将是<span>内外的“http”。 / p>

为了进一步澄清，links_found将是包含<span...href-web:...和http...</span>的数组。由于您只是将link_found传递给gsub作为要匹配的模式，因此它将替换links_found[]数组中的每个对象

Answer 2

首先，规则一，除了处理HTML或XML时最简单的事情之外，不要为字符串操作或正则表达式而烦恼。不这样做是madness的确定方法。

相反，请保存您的理智并寻找真正的解析器。对于Ruby，我强烈建议你只看Nokogiri - 它只是有效。

考虑以下代码：

require 'nokogiri'

message = "For Last Minute rentals, please go to:
<span class='external_link' href-web='http://www.mydomain.com/thepage'>http://www.mydomain.com/thepage</span> (1)

For more information about our events, please visit our website: 
<span class='external_link' href-web='http://www.mydomain.com'>http://www.mydomain.com</span> (2)"

doc = Nokogiri::HTML(message)

external_spans = doc.search('span.external_link')

url1 = external_spans[0]['href-web'] # => "http://www.mydomain.com/thepage"
text1 = external_spans[0].text       # => "http://www.mydomain.com/thepage"
url2 = external_spans[1]['href-web'] # => "http://www.mydomain.com"
text2 = external_spans[1].text       # => "http://www.mydomain.com"

url和text1来自span 1和url2以及text2的网址分别来自span 2。

我不确定你想用它们做什么，因为在看了一眼之后，我看不出你的来源和所需的输出有什么不同，但是，一旦你拥有它们你就很漂亮了可以自由地做任何事情。像Nokogiri这样的解析器允许您从HTML或XML DOM中检索信息，替换它，移动东西，甚至拼接新东西。

解析＆amp;替换多个链接但不包含另一个链接

2 个答案: