我正在尝试解析包含这些值的html页面:
<a href="somesite.html?id=123">...</a>
<a href="somesite.html?id=456">...</a>
<a href="somesite.html?id=789">...</a>
<a href="anothersite.html">...</a>
我如何解析Html字符串以获取仅包含somesite.html的数组:
["somesite.html?id=123", "somesite.html?id=456", "somesite.html?id=456"]
被修改
使用Zhiguo Wang的基本答案,我似乎无法获得somesite.html id值...数组中的第3项包含多余的字符:
let htmlString = "<a href=\"somesite.html?id=123\">...</a>" +
"<a href=\"somesite.html?id=456\">...</a>" +
"<a href=\"somesite.html?id=789\">...</a>" +
"<a href=\"anothersite.html\">...</a>\""
let seperateComponent = "<a href=\"somesite.html?id="
let linkExp = "[\\w\\W]*\">"
返回此值:
["123", "456", "789\\">...</a><a href=\\"anothersite.html"]
预期价值: [&#34; 123&#34;,&#34; 456&#34;,&#34; 789&#34;]
... HMM。将linkExp更改为以下内容可解决此问题。 \ W在Regex中代表什么?
let linkExp = "[\\w]*\">"
..长度错了。转向NSString以获得适当的长度。
已编辑2
如果这个字符串首先出现在somesite之前,那么它在数组中包含 Origin :
<meta name=\"referrer\" content=\"origin\">
答案 0 :(得分:1)
let htmlString = "<a href=\"somesite.html?id=123\">...</a><a href=\"somesite.html?id=456\">...</a><a href=\"somesite.html?id=789\">...</a>"
let seperateComponent = "<a href=\""
let linkExp = "[\\w\\W]*\">"
let linkRegExp = NSRegularExpression(pattern:linkExp, options: NSRegularExpressionOptions.CaseInsensitive, error: nil)
let seperatedArray = htmlString.componentsSeparatedByString(seperateComponent)
var resultArray = [String]()
if seperatedArray.count > 1 {
for seperatedString in seperatedArray {
if seperatedString.lengthOfBytesUsingEncoding(NSUTF8StringEncoding) > 3{
let myRange = linkRegExp!.rangeOfFirstMatchInString(seperatedString, options:NSMatchingOptions.ReportCompletion, range: NSMakeRange(0, seperatedString.lengthOfBytesUsingEncoding(NSUTF8StringEncoding)))
if myRange.location != NSNotFound {
let matchString = (seperatedString as NSString).substringWithRange(myRange)
let linkString = (matchString as NSString).substringToIndex(matchString.lengthOfBytesUsingEncoding(NSUTF8StringEncoding) - 2)
resultArray.append(linkString)
}
}
}
}
println(resultArray)
这些代码已在xcode 6.4上运行,结果是正确的。&#34;我需要至少10个声望来发布图片&#34;所以结果图片不会发布在这里。
答案 1 :(得分:0)
答案 2 :(得分:0)
这是改进的代码
let htmlString = "<a href=\"somesite.html?id=123\">...</a>" +
"<a href=\"somesite.html?id=456\">...</a>" +
"<a href=\"somesite.html?id=789\">...</a>" +
"<a href=\"anothersite.html\">...</a>\""
let seperateComponent = "<a href=\""
let linkExp = "[\\w\\W]*\">"
let linkRegExp = NSRegularExpression(pattern:linkExp, options: NSRegularExpressionOptions.CaseInsensitive, error: nil)
let seperatedArray = htmlString.componentsSeparatedByString(seperateComponent)
var resultArray = [String]()
if seperatedArray.count > 1 {
for seperatedString in seperatedArray {
if seperatedString.lengthOfBytesUsingEncoding(NSUTF8StringEncoding) > 3{
let myRange = linkRegExp!.rangeOfFirstMatchInString(seperatedString, options:NSMatchingOptions.ReportCompletion, range: NSMakeRange(0, seperatedString.lengthOfBytesUsingEncoding(NSUTF8StringEncoding)))
if myRange.location != NSNotFound {
let matchString = (seperatedString as NSString).substringWithRange(myRange)
let linkWished = "somesite.html?id="
if matchString.componentsSeparatedByString(linkWished).count > 1{
var linkString = (matchString as NSString).substringFromIndex(linkWished.lengthOfBytesUsingEncoding(NSUTF8StringEncoding))
linkString = (linkString as NSString).substringToIndex(linkString.lengthOfBytesUsingEncoding(NSUTF8StringEncoding) - 2)
resultArray.append(linkString)
}
}
}
}
}
println(resultArray)