我想从具有唯一开始和结束字符的字符串中提取值。就我而言,它的 em
"Fully <em>Furni<\/em>shed |Downtown and Canal Views",
结果
家具摆设
答案 0 :(得分:3)
我想您想删除标签。
如果反斜杠仅是虚拟的,则模式非常简单:基本上是<em>
,带有可选的斜杠/?
let trimmedString = string.replacingOccurrences(of: "</?em>", with: "", options: .regularExpression)
还要考虑反斜杠
let trimmedString = string.replacingOccurrences(of: "<\\\\?/?em>", with: "", options: .regularExpression)
如果只想提取 Furnished
,则必须捕获组:标签与结束标签之后的所有内容之间的字符串,直到下一个空白字符。
let string = "Fully <em>Furni<\\/em>shed |Downtown and Canal Views"
let pattern = "<em>(.*)<\\\\?/em>(\\S+)"
do {
let regex = try NSRegularExpression(pattern: pattern)
if let match = regex.firstMatch(in: string, range: NSRange(string.startIndex..., in: string)) {
let part1 = string[Range(match.range(at: 1), in: string)!]
let part2 = string[Range(match.range(at: 2), in: string)!]
print(String(part1 + part2))
}
} catch { print(error) }
答案 1 :(得分:2)
正则表达式:
如果要通过正则表达式实现此目的,可以使用Valexa's answer:
public extension String {
public func capturedGroups(withRegex pattern: String) -> [String] {
var results = [String]()
var regex: NSRegularExpression
do {
regex = try NSRegularExpression(pattern: pattern, options: [])
} catch {
return results
}
let matches = regex.matches(in: self, options: [], range: NSRange(location:0, length: self.count))
guard let match = matches.first else { return results }
let lastRangeIndex = match.numberOfRanges - 1
guard lastRangeIndex >= 1 else { return results }
for i in 1...lastRangeIndex {
let capturedGroupIndex = match.range(at: i)
let matchedString = (self as NSString).substring(with: capturedGroupIndex)
results.append(matchedString)
}
return results
}
}
像这样:
let text = "Fully <em>Furni</em>shed |Downtown and Canal Views"
print(text.capturedGroups(withRegex: "<em>([a-zA-z]+)</em>"))
结果:
[“ Furni”]
NSAttributedString:
如果要突出显示,或者只需要去除标签或无法使用第一种解决方案的任何其他原因,也可以使用NSAttributedString
:
extension String {
var attributedStringAsHTML: NSAttributedString? {
do{
return try NSAttributedString(data: Data(utf8),
options: [
.documentType: NSAttributedString.DocumentType.html,
.characterEncoding: String.Encoding.utf8.rawValue],
documentAttributes: nil)
}
catch {
print("error: ", error)
return nil
}
}
}
func getTextSections(_ text:String) -> [String] {
guard let attributedText = text.attributedStringAsHTML else {
return []
}
var sections:[String] = []
let range = NSMakeRange(0, attributedText.length)
// we don't need to enumerate any special attribute here,
// but for example, if you want to just extract links you can use `NSAttributedString.Key.link` instead
let attribute: NSAttributedString.Key = .init(rawValue: "")
attributedText.enumerateAttribute(attribute,
in: range,
options: .longestEffectiveRangeNotRequired) {attribute, range, pointer in
let text = attributedText.attributedSubstring(from: range).string
sections.append(text)
}
return sections
}
let text = "Fully <em>Furni</em>shed |Downtown and Canal Views"
print(getTextSections(text))
结果:
[“ Fully”,“ Furni”,“棚屋|市区和运河景观”]
答案 2 :(得分:1)
不是正则表达式,而是用于获取标签中的所有单词,例如[Furni,sma]:
let text = "Fully <em>Furni<\\/em>shed <em>sma<\\/em>shed |Downtown and Canal Views"
let emphasizedParts = text.components(separatedBy: "<em>").filter { $0.contains("<\\/em>")}.flatMap { $0.components(separatedBy: "<\\/em>").first }
完整的单词,例如[带家具的,粉碎的]:
let emphasizedParts = text.components(separatedBy: " ").filter { $0.contains("<em>")}.map { $0.replacingOccurrences(of: "<\\/em>", with: "").replacingOccurrences(of: "<em>", with: "") }
答案 3 :(得分:1)
给出以下字符串:
let str = "Fully <em>Furni<\\/em>shed |Downtown and Canal Views"
和相应的NSRange
:
let range = NSRange(location: 0, length: (str as NSString).length)
让我们构造一个正则表达式,该表达式将匹配<em>
和</em>
之间或前面带有</em>
的字母
let regex = try NSRegularExpression(pattern: "(?<=<em>)\\w+(?=<\\\\/em>)|(?<=<\\\\/em>)\\w+")
它的作用是:
\\w+
,<em>
:(?<=<em>)
(向后看),<\/em>
:(?=<\\\\/em>)
( lookahead 正),|
\\w+
,<\/em>
:(?=<\\\\/em>)
(向后看)让我们得到比赛:
let matches = regex.matches(in: str, range: range)
我们可以将其转换为子字符串:
let strings: [String] = matches.map { match in
let start = str.index(str.startIndex, offsetBy: match.range.location)
let end = str.index(start, offsetBy: match.range.length)
return String(str[start..<end])
}
现在我们可以将偶数索引中的字符串与奇数索引中的字符串连接起来
let evenStride = stride(from: strings.startIndex,
to: strings.index(strings.endIndex, offsetBy: -1),
by: 2)
let result = evenStride.map { strings[$0] + strings[strings.index($0, offsetBy: 1)]}
print(result) //["Furnished"]
我们可以用另一个字符串对其进行测试:
let str2 = "<em>Furni<\\/em>shed <em>balc<\\/em>ony <em>gard<\\/em>en"
结果将是:
["Furnished", "balcony", "garden"]
答案 4 :(得分:0)
这是PHP的基本实现(是的,我知道您问过Swift,但这是为了演示正则表达式部分):
<?php
$in = "Fully <em>Furni</em>shed |Downtown and Canal Views";
$m = preg_match("/<([^>]+)>([^>]+)<\/\\1>([^ ]+|$)/i", $in, $t);
$s = $t[2] . $t[3];
echo $s;
输出:
ZC-MGMT-04:~ jv$ php -q regex.php
Furnished
显然,最重要的一点是正则表达式部分,它将与任何标签匹配,并找到相应的结束标签并随后提醒
答案 5 :(得分:0)
如果您只想提取<em>
和<\/em>
之间的文本(请注意,这不是普通的HTML标签,因为它原来是<em>
和</em>
)标签,我们可以简单地捕获此模式并将其替换为捕获的组1的值。而且,我们不必担心匹配文本周围会出现什么,只需将其替换为那些实际上是空字符串的文本之间捕获的内容,因为OP对此没有提及任何约束。匹配此模式的正则表达式为this
<em>(.*?)<\\\/em>
从技术上讲,在处理可选空间(如我看到有人在其他答案的注释中指出)方面要更加健壮,可以在标记中的任意位置显示该正则表达式,
<\s*em\s*>(.*?)<\s*\\\/em\s*>
并根据您在哪里将其替换为\1
或$1
。现在,这些标签是否包含空字符串,或其中是否包含一些实际的字符串,与我在regex101上的演示中所显示的无关紧要。
让我知道这是否满足您的要求,并且进一步满足您的要求。