为什么我的正则表达式没有抓住这个日文网页的群组?

时间:2017-01-08 22:56:09

标签: swift swift3 nsregularexpression

我希望通过UTF-8文本编码从这个日语网站页面获取go:image属性内容。

期望的结果是:

http://www.macotakara.jp//blog/archives/001/201701/5871de9fb4929.jpg

但我明白了:

jp//blog/archives/001/201701/5871bd1be125c.jpg" />

我认为这个问题与范围的使用有关。

你可以参考这个正则表达式:https://regex101.com/r/F29INt/1

html代码段如下:

<meta name="description" content="CES2017において、OtterBoxが、様々なモジュールを装着出来るモジュール式iPhoneケース「uniVERSE」の展示を行っていました。 背面にあるスライド式「uniVERSEケースシステム」を使用して、背面の下半分を変更す..." />
<meta property="og:image" name="og:image" content="http://www.macotakara.jp//blog/archives/001/201701/5871de9fb4929.jpg" />
<meta name="twitter:image" 

我的正则表达式如下:

public class Regex {
    let regex: NSRegularExpression
    let pattern: String

    public init(_ pattern: String) {
        self.pattern = pattern
        regex = try! NSRegularExpression(pattern: pattern, options: [.caseInsensitive])
    }

    public func matches(_ input: String) -> [NSTextCheckingResult] {
        let matches = regex.matches(in: input, options: [], range:NSRange(location:0, length:input.characters.count))
        return matches
    }
}

我使用的代码如下:

let pattern = "<meta[^>]+property=[\"']\(property)[\"'][^>]+content=[\"']([^\"']*)[\"'][^>]*>"
let regex = Regex(pattern)
let matches = regex.matches(html)

for match in matches {
    // range at index 0: full match
    // range at index 1: first capture group
    var text = ""
    text += "+++StoryPreviewCache.getMetaPropertyContent(): with pattern=\(pattern) for prop=\(property)"
    for j in 1..<match.numberOfRanges {
       text += "+++StoryPreviewCache.getMetaPropertyContent(): Groups \(j), range=\(match.rangeAt(j)), is \(html[match.rangeAt(j)])"
    }
}
print(text)

我得到了:

+++StoryPreviewCache.getMetaPropertyContent():
with pattern=<meta[^>]+property=["']og:image["'][^>]+content=["']([^"']*)["'][^>]*> 
for prop=og:image
+++StoryPreviewCache.getMetaPropertyContent(): 
Groups 1, 
range=__C._NSRange, 
is jp//blog/archives/001/201701/5871bd1be125c.jpg" />

1 个答案:

答案 0 :(得分:0)

按照Martin R提出的问题,我写了这个扩展名:

extension NSTextCheckingResult {
    public func capture(group:Int, in text:String) -> String {
        let range = self.rangeAt(group)
        let content = (text as NSString).substring(with: range)
        return content as String
    }
}

并在Regex中更改了我的代码,如下所示:

public func matches(_ input: String) -> [NSTextCheckingResult] {
    let nsString = input as NSString
    let matches = regex.matches(in: input, range: NSRange(location: 0, length: nsString.length))
    // former code as follows
    //let matches = regex.matches(in: input, options: [], range:NSRange(location:0, length:input.characters.count))
    return matches
}

现在我这样使用它:

       for match in matches {
            var text = ""
            text += "+++StoryPreviewCache.getMetaPropertyContent(): with pattern=\(pattern) for prop=\(property)"
            for j in 1..<match.numberOfRanges {
                text += "+++StoryPreviewCache.getMetaPropertyContent(): Groups \(j), is \(match.capture(group:j, in: html))"
            }
        }