我需要从一个带有POST请求的网站返回的字符串中提取数据;我正在使用SwiftSoup库解析数据。我使用CSS选择器选择了列表项:
.pollEnrich("smb://domain;login:pwd@host/dir?password=pwd&preMove=backup&move=processed&moveFailed=error&charset=UTF-8", 1000)
返回如下的html:
let iconsList: Element = try doc.select("ul.icons-list").first()!
现在我需要提取标签和值并存储在数组内部或可能是单独的变量。我试过正则表达式如下所示(不起作用,也许是错误的正则表达式):
<ul class="icons-list">
<li><strong>Label 1:</strong> Value 1 (Some text) </li>
<li><strong>Label 2:</strong> Value 2</li>
<li><strong>Label 3:</strong> Value 3</li>
<li><strong>Label 4:</strong> Value 4 </li>
<li><strong>Label 5:</strong> Value 5</li>
</ul>
还尝试了SwiftSoup选择器:
let result = "This <strong>Needs to be removed</strong> is my string"
let regex = try! NSRegularExpression(pattern: "<strong>(.*)</strong>", options: .caseInsensitive)
var newStr = regex.stringByReplacingMatches(in: result, options: [], range: NSRange(0..<str.utf16.count), withTemplate: "")
print(newStr)
但它也会返回HTML结果。所以,我需要在两种情况下都使用正则表达式。怎么办呢?
另一个问题: 当我使用SwiftSoup“.select”选择器选择图标列表类时。如果有例外,我该如何处理?目前,我有这个代码,但它不起作用。如果我想在这个块中处理多个try块怎么办?
var labelFirst = try doc.select("ul.icons-list li:nth-child(1)")
答案 0 :(得分:0)
我能够弄清楚自己。以下是我的表现:
var res = "<ul class=\"icons-list\"><li><strong>Label 1:</strong> Value 1 (Some text) </li></ul>"
extension String {
func capturedGroups(withRegex pattern: String) -> [String] {
var results = [String]()
var regex: NSRegularExpression
do {
regex = try NSRegularExpression(pattern: pattern, options: [])
} catch {
return results
}
let matches = regex.matches(in: self, options: [], range: NSRange(location:0, length: self.characters.count))
guard let match = matches.first else { return results }
let lastRangeIndex = match.numberOfRanges - 1
guard lastRangeIndex >= 1 else { return results }
for i in 1...lastRangeIndex {
let capturedGroupIndex = match.rangeAt(i)
let matchedString = (self as NSString).substring(with: capturedGroupIndex)
results.append(matchedString)
}
return results
}
}
let label1 = res.capturedGroups(withRegex: "<strong>(.*)</strong>")
let value1 = res.capturedGroups(withRegex: "</strong>(.*)</li>")
print("\(label1[0]): \(value1[0])")
//Output: Label 1: Value 1 (Some text)
如果有人给我更好的方法或改进我的功能,我仍然会感激!