我想清除HTML字符串文本中的所有属性。我已经找到了很多解决办法,但是问题是如果我们没有正确的CSS样式,则正则表达式的语法不起作用。我的处境很困难,因为从API获得的HTML文本的样式不正确。 可能是这样的:
<p style="\"text-align:" justify;="" \"=""><span style="\"font-size:" 13px;="" font-family:="" arial;="" text-decoration-skip-ink:="" none;\"=""><b><span style="font-size: 18px;">Angkor Wat</span></b> is a temple complex in Cambodia and the largest religious monument in the world, on a site measuring 162.6 hectares (1,626,000 m2; 402 acres). It was originally constructed as a Hindu temple dedicated to the god Vishnu for the Khmer Empire, gradually transforming into a Buddhist temple towards the end of the 12th century. It was built by the Khmer King Suryavarman II in the early 12th century in Yaśodharapura, the capital of the Khmer Empire, as his state temple and eventual mausoleum. Breaking from the Shaiva tradition of previous kings, Angkor Wat was instead dedicated to Vishnu. As the best-preserved temple at the site, it is the only one to have remained a significant religious centre since its foundation. The temple is at the top of the high classical style of Khmer architecture. It has become a symbol of Cambodia, appearing on its national flag, and it is the country\'s prime attraction for visitors.</span></p><p style="\"text-align:" justify;="" \"=""><span style="\"font-size:" 13px;="" font-family:="" arial;="" text-decoration-skip-ink:="" none;\"="">Angkor Wat combines two basic plans of Khmer temple architecture: the temple-mountain and the later galleried temple. It is designed to represent Mount Meru, home of the devas in Hindu mythology: within a moat and an outer wall 3.6 kilometres (2.2 mi) long are three rectangular galleries, each raised above the next. At the centre of the temple stands a quincunx of towers. Unlike most Angkorian temples, Angkor Wat is oriented to the west; scholars are divided as to the significance of this. The temple is admired for the grandeur and harmony of the architecture, its extensive bas-reliefs, and for the numerous devatas adorning its walls.</span></p>
您可以通过复制并将整个文本粘贴到this website中来测试此字符串,因为我想找到可以删除所有CSS样式的正确正则表达式。
我想要可以像Useful HTML Cleaner Website这样工作的正则表达式
这是在清理HTML之前:
这是在清理HTML之后:
这些网站可以清除所有HTML属性,并且不关心这些属性的格式是否错误
我在网站上发现许多可以清除html属性的正则表达式,但不适用于我的情况:以下是一些正则表达式:
<[^>]+((style|class)="[^"]*")[^>]*>
<\s*([a-z][a-z0-9]*)\s.*?>
style=\"([^\"]*)\"
style="(.*?)"
<\\s*([a-z][a-z0-9]*)\\s.*?>
EDIT 这是一个有用的功能,可以从Tobi中删除样式:
let regex = try! NSRegularExpression(pattern: "style=\"([^\"]*)\"", options: .caseInsensitive)
let range = NSMakeRange(0, html.characters.count)
let modString = regex.stringByReplacingMatches(in: html, options: [], range: range, withTemplate: "")
此正则表达式的结果仍然是这样的:
<p text-align:" justify;="" \"=""><span font-size:" 13px;="" font-family:="" arial;="" text-decoration-skip-ink:="" none;\"=""><b><span >Angkor Wat</span></b> is a temple complex in Cambodia and the largest religious monument in the world, on a site measuring 162.6 hectares (1,626,000 m2; 402 acres). It was originally constructed as a Hindu temple dedicated to the god Vishnu for the Khmer Empire, gradually transforming into a Buddhist temple towards the end of the 12th century. It was built by the Khmer King Suryavarman II in the early 12th century in Yaśodharapura, the capital of the Khmer Empire, as his state temple and eventual mausoleum. Breaking from the Shaiva tradition of previous kings, Angkor Wat was instead dedicated to Vishnu. As the best-preserved temple at the site, it is the only one to have remained a significant religious centre since its foundation. The temple is at the top of the high classical style of Khmer architecture. It has become a symbol of Cambodia, appearing on its national flag, and it is the country\'s prime attraction for visitors.</span></p><p text-align:" justify;="" \"=""><span font-size:" 13px;="" font-family:="" arial;="" text-decoration-skip-ink:="" none;\"="">Angkor Wat combines two basic plans of Khmer temple architecture: the temple-mountain and the later galleried temple. It is designed to represent Mount Meru, home of the devas in Hindu mythology: within a moat and an outer wall 3.6 kilometres (2.2 mi) long are three rectangular galleries, each raised above the next. At the centre of the temple stands a quincunx of towers. Unlike most Angkorian temples, Angkor Wat is oriented to the west; scholars are divided as to the significance of this. The temple is admired for the grandeur and harmony of the architecture, its extensive bas-reliefs, and for the numerous devatas adorning its walls.</span></p>
请使用此Website测试我给定的字符串
此正则表达式只能清除仅 style =“” 格式
的样式答案 0 :(得分:3)
您可以使用SwiftSoup帮助您解决此问题。这是我的代码
do {
let doc: Document = try SwiftSoup.parse(html)
let elements = try doc.getAllElements()
try elements.forEach { (el) in
let attr = el.getAttributes()
try attr?.forEach({ (att) in
try el.removeAttr(att.getKey())
})
}
print(try doc.body()?.html())
} catch Exception.Error(let type, let message) {
print(type,message)
} catch {
print("error")
}
这是结果
<p><span><b><span>Angkor Wat</span></b> is a temple complex in Cambodia and the largest religious monument in the world, on a site measuring 162.6 hectares (1,626,000 m2; 402 acres). It was originally constructed as a Hindu temple dedicated to the god Vishnu for the Khmer Empire, gradually transforming into a Buddhist temple towards the end of the 12th century. It was built by the Khmer King Suryavarman II in the early 12th century in Yaśodharapura, the capital of the Khmer Empire, as his state temple and eventual mausoleum. Breaking from the Shaiva tradition of previous kings, Angkor Wat was instead dedicated to Vishnu. As the best-preserved temple at the site, it is the only one to have remained a significant religious centre since its foundation. The temple is at the top of the high classical style of Khmer architecture. It has become a symbol of Cambodia, appearing on its national flag, and it is the country\'s prime attraction for visitors.</span></p>\n<p><span>Angkor Wat combines two basic plans of Khmer temple architecture: the temple-mountain and the later galleried temple. It is designed to represent Mount Meru, home of the devas in Hindu mythology: within a moat and an outer wall 3.6 kilometres (2.2 mi) long are three rectangular galleries, each raised above the next. At the centre of the temple stands a quincunx of towers. Unlike most Angkorian temples, Angkor Wat is oriented to the west; scholars are divided as to the significance of this. The temple is admired for the grandeur and harmony of the architecture, its extensive bas-reliefs, and for the numerous devatas adorning its walls.</span></p>
希望这可以帮助您:)
答案 1 :(得分:2)
如果您正在寻找单线正则表达式:
let regex = try! NSRegularExpression(pattern: "(?<=<\\w{1,40})\\s[^>]+(?=>)", options: .caseInsensitive)
let range = NSMakeRange(0, html.count)
let htmlWithoutInlineAttributes = regex.stringByReplacingMatches(in: html, options: [], range: range, withTemplate: "")
print(htmlWithoutInlineAttributes)
假设html
是这样的:
let html = "<p style ="\"text-align:" justify;="" \"=""><span style="\"font-size:" 13px;="" font-family:="" arial;="" text-decoration-skip-ink:="" none;\"=""><b><span style="font-size: 18px;">Angkor Wat</span></b> is a temple complex in Cambodia and the largest religious monument in the world, on a site measuring 162.6 hectares (1,626,000 m2; 402 acres). It was originally constructed as a Hindu temple dedicated to the god Vishnu for the Khmer Empire, gradually transforming into a Buddhist temple towards the end of the 12th century. It was built by the Khmer King Suryavarman II in the early 12th century in Yaśodharapura, the capital of the Khmer Empire, as his state temple and eventual mausoleum. Breaking from the Shaiva tradition of previous kings, Angkor Wat was instead dedicated to Vishnu. As the best-preserved temple at the site, it is the only one to have remained a significant religious centre since its foundation. The temple is at the top of the high classical style of Khmer architecture. It has become a symbol of Cambodia, appearing on its national flag, and it is the country\'s prime attraction for visitors.</span></p><p style="\"text-align:" justify;="" \"=""><span style="\"font-size:" 13px;="" font-family:="" arial;="" text-decoration-skip-ink:="" none;\"="">Angkor Wat combines two basic plans of Khmer temple architecture: the temple-mountain and the later galleried temple. It is designed to represent Mount Meru, home of the devas in Hindu mythology: within a moat and an outer wall 3.6 kilometres (2.2 mi) long are three rectangular galleries, each raised above the next. At the centre of the temple stands a quincunx of towers. Unlike most Angkorian temples, Angkor Wat is oriented to the west; scholars are divided as to the significance of this. The temple is admired for the grandeur and harmony of the architecture, its extensive bas-reliefs, and for the numerous devatas adorning its walls.</span></p>"
以下是正则表达式说明:
(?<=<\\w{1,40})
可能已被(?<=<[a-z]+)
取代>
以外的任何字符,但至少一个(同样,不需要匹配没有任何属性的标签)。在现实世界中未经消毒的HTML文档中,这可能是不可靠的,HTML文档可能到处都是>
。>
答案 2 :(得分:1)
首先,您需要取消转义HTML,然后可以尝试使用以下正则表达式清除所有HTML
html2是您的html
let escapedString = html2.replacingOccurrences(of: " \\ \"\" ", with: "")
let regex = try! NSRegularExpression(pattern: "<[^>]*>", options: .caseInsensitive)
let range = NSMakeRange(0, escapedString.characters.count)
let modString = regex.stringByReplacingMatches(in: escapedString, options: [], range: range, withTemplate: "")
print(modString)
答案 3 :(得分:1)
这不是那么容易,因为您的HTML已完全损坏。我建议您向您的API设计师询问为什么API会输出这种完全损坏的HTML。
无论如何,如果您需要使用正则表达式来处理此类类似HTML的内容,则可能需要检测打开标记并删除标记名称以外的所有内容:
A
da vamp的答案似乎要好得多。