用正则表达式检索链接数组 - Swift

时间:2015-09-22 04:07:39

标签: ios swift

我正在尝试解析包含这些值的html页面:

<a href="somesite.html?id=123">...</a>
<a href="somesite.html?id=456">...</a>
<a href="somesite.html?id=789">...</a>
<a href="anothersite.html">...</a>

我如何解析Html字符串以获取仅包含somesite.html的数组:

["somesite.html?id=123", "somesite.html?id=456", "somesite.html?id=456"]

被修改

使用Zhiguo Wang的基本答案,我似乎无法获得somesite.html id值...数组中的第3项包含多余的字符:

let htmlString = "<a href=\"somesite.html?id=123\">...</a>" +
"<a href=\"somesite.html?id=456\">...</a>" +
"<a href=\"somesite.html?id=789\">...</a>" +
"<a href=\"anothersite.html\">...</a>\""
let seperateComponent = "<a href=\"somesite.html?id="

let linkExp = "[\\w\\W]*\">"

返回此值:

["123", "456", "789\\">...</a><a href=\\"anothersite.html"]

预期价值:     [&#34; 123&#34;,&#34; 456&#34;,&#34; 789&#34;]

... HMM。将linkExp更改为以下内容可解决此问题。 \ W在Regex中代表什么?

let linkExp = "[\\w]*\">"

..长度错了。转向NSString以获得适当的长度。

已编辑2

如果这个字符串首先出现在somesite之前,那么它在数组中包含 Origin

<meta name=\"referrer\" content=\"origin\">

3 个答案:

答案 0 :(得分:1)

谈话很便宜,告诉我代码

    let htmlString = "<a href=\"somesite.html?id=123\">...</a><a href=\"somesite.html?id=456\">...</a><a href=\"somesite.html?id=789\">...</a>"
    let seperateComponent = "<a href=\""

    let linkExp = "[\\w\\W]*\">"
    let linkRegExp = NSRegularExpression(pattern:linkExp, options: NSRegularExpressionOptions.CaseInsensitive, error: nil)
    let seperatedArray = htmlString.componentsSeparatedByString(seperateComponent)
    var resultArray = [String]()

    if seperatedArray.count > 1 {
        for seperatedString in seperatedArray {
            if seperatedString.lengthOfBytesUsingEncoding(NSUTF8StringEncoding) > 3{
                let myRange = linkRegExp!.rangeOfFirstMatchInString(seperatedString, options:NSMatchingOptions.ReportCompletion, range: NSMakeRange(0, seperatedString.lengthOfBytesUsingEncoding(NSUTF8StringEncoding)))
                if myRange.location != NSNotFound {
                    let matchString = (seperatedString as NSString).substringWithRange(myRange)
                    let linkString = (matchString as NSString).substringToIndex(matchString.lengthOfBytesUsingEncoding(NSUTF8StringEncoding) - 2)

                    resultArray.append(linkString)
                }
            }
        }
    }

    println(resultArray)

这些代码已在xcode 6.4上运行,结果是正确的。&#34;我需要至少10个声望来发布图片&#34;所以结果图片不会发布在这里。

答案 1 :(得分:0)

我认为正则表达式在解析HTML文件时可能会折腾。你有更好的方法来解析iOS文件的HTML文件。这是一个tutorialTFHppleNDHpple是您的朋友。

以下是相关的SO thread

答案 2 :(得分:0)

这是改进的代码

    let htmlString = "<a href=\"somesite.html?id=123\">...</a>" +
        "<a href=\"somesite.html?id=456\">...</a>" +
        "<a href=\"somesite.html?id=789\">...</a>" +
    "<a href=\"anothersite.html\">...</a>\""
    let seperateComponent = "<a href=\""

    let linkExp = "[\\w\\W]*\">"
    let linkRegExp = NSRegularExpression(pattern:linkExp, options: NSRegularExpressionOptions.CaseInsensitive, error: nil)
    let seperatedArray = htmlString.componentsSeparatedByString(seperateComponent)
    var resultArray = [String]()

    if seperatedArray.count > 1 {
        for seperatedString in seperatedArray {
            if seperatedString.lengthOfBytesUsingEncoding(NSUTF8StringEncoding) > 3{
                let myRange = linkRegExp!.rangeOfFirstMatchInString(seperatedString, options:NSMatchingOptions.ReportCompletion, range: NSMakeRange(0, seperatedString.lengthOfBytesUsingEncoding(NSUTF8StringEncoding)))
                if myRange.location != NSNotFound {
                    let matchString = (seperatedString as NSString).substringWithRange(myRange)

                    let linkWished = "somesite.html?id="

                    if matchString.componentsSeparatedByString(linkWished).count > 1{

                        var linkString = (matchString as NSString).substringFromIndex(linkWished.lengthOfBytesUsingEncoding(NSUTF8StringEncoding))

                        linkString = (linkString as NSString).substringToIndex(linkString.lengthOfBytesUsingEncoding(NSUTF8StringEncoding) - 2)

                        resultArray.append(linkString)
                    }


                }
            }
        }
    }

    println(resultArray)