替换HTML标签内的文本,HTML属性内(正则表达式)除外

时间:2019-05-28 18:28:54

标签: html swift regex replace swiftsoup

我正在尝试在HTML字符串中找到一些文本正则表达式匹配项,并用特殊标记替换匹配项。在下面的示例字符串中,我想找到单词swiftsoup,并将其替换为<b>swiftsoup</b>,但排除所有属性中的所有匹配项,例如id="swiftsoup"href url中的所有匹配项。

// example string
<p>swiftsoup is awesome, but I don't know how to solve with <a id="swiftsoup" href="https://github.com/scinfu/swiftsoup">swiftsoup</a> or other. Love swiftsoup even so.</p>

下面的SwiftSoup代码当然是行不通的,因为ownText()text()不是变异函数,无法处理replacingOccurrences(of:with:)的未使用结果:

let h = #"<p>swiftsoup is awesome, but I don't know how to solve with <a id="swiftsoup" href="https://github.com/scinfu/swiftsoup">swiftsoup</a> or other. Love swiftsoup even so.</p>"#

let p = try! SwiftSoup.parse(h).select("p").first()!

p.ownText().replacingOccurrences(of: "swiftsoup", with: "<b>swiftsoup</b>")
           ^~~~~~

也许带有html()的正则表达式可能会有所帮助,但我不知道如何在属性值内保留匹配项:

extension String {
    func markUpSwiftSoup() -> String {
        var selfResult = self
        let selfAsNSString = self as NSString

        if let regex = try? NSRegularExpression(pattern: "swiftsoup") {

            let range = NSRange(location: 0, length: selfAsNSString.length)
            regex.matches(in: self, options: [], range: range).forEach {

                let match = selfAsNSString.substring(with: $0.range)
                selfResult = selfResult.replacingOccurrences(of: match, with: "<b>\(match)</b>")
            }

            return selfResult

        } else {
            return self
        }
    }
}

var pHTML = try! p.html()
p.html(pHTML.markUpSwiftSoup())

我尝试获得的结果是:

<p><b>swiftsoup</b> is awesome, but I don't know how to solve with <a id="swiftsoup" href="https://github.com/scinfu/swiftsoup"><b>swiftsoup</b></a> or other. Love <b>swiftsoup</b> even so.</p>

谢谢!

0 个答案:

没有答案