我有这样的文字 -
This is a test text. <span> with bold </span> and with <span> italic </span> and so on and so forth.
现在,我正在使用此正则表达式来识别所有html <[^>]*>
然后我用空字符串替换所有的html,结果就像这样
This is a test text. with bold and with italic and so and so forth.
在上面的文字中,我想识别文本,例如“斜体”,并在其周围插入特殊标签,然后重建原始文本。所以,结果将是
This is a test text. <span> with bold </span> and with <span> <span class='special'>italic</span> </span> and so on and so forth.
我正在创建获取matcher.start()和matcher.end()的代码来制作所有html标签的列表,然后我正在考虑基于此列表进行重建。有没有更好的方法呢?你会如何解决它?
修改
替换html后搜索文本的原因是因为,html会干扰我正在寻找的文本。例如,它可能就像这样
This is a test text. <span> with bold </span> and with <span> it</span>al<span>ic </span> and so on and so forth.
EDIT2
这不是一个重复的问题,就像它被建议一样。想象一个场景,你需要突出显示你在屏幕上看到的html,除了在你选择的文本中添加一个黄色背景颜色的简单跨度。现在,假设此文本是斜体,但它显示为<span>ita</span>l<span>ic</span>
。我的问题是你如何找到这个词,然后在它周围添加跨度?
EDIT3 最终编辑以简化问题陈述。我希望这说清楚。 这是输入 -
This is a test text with <span>it<span>al<span>ic</span> and etc.
这是预期的输出 -
This is a test text with <span class='highlight'><span>it<span>al<span>ic</span></span> and etc.
答案 0 :(得分:1)
这将执行您正在寻找的内容,但它不会检测/防止错误的html生成。
import java.util.ArrayList;
import java.util.List;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class HtmlHighlighter {
private final String inputWithoutTags;
private final List<Tag> tags;
private static class Tag {
private final String text;
private final int startPos;
private Tag(final String text, final int startPos) {
this.text = text;
this.startPos = startPos;
}
}
public HtmlHighlighter(final String input, final String tagRegex) {
final Pattern p = Pattern.compile(tagRegex);
tags = new ArrayList<>();
final Matcher m = p.matcher(input);
StringBuffer sb = new StringBuffer();
int cursor = 0;
int cursorExcludingTags = 0;
while (m.find()) {
cursorExcludingTags += m.start() - cursor;
tags.add(new Tag(input.substring(m.start(), m.end()), cursorExcludingTags));
cursor = m.end();
m.appendReplacement(sb, "");
}
m.appendTail(sb);
inputWithoutTags = sb.toString();
}
public String highlightText(String regexToFind, String openingTag, String closingTag) {
final List<Tag> allTags = getAllTags(regexToFind, openingTag, closingTag);
return combineTags(allTags);
}
private List<Tag> getAllTags(final String regexToFind, final String openingTag, final String closingTag) {
final List<Tag> ret = new ArrayList<>(tags);
final Pattern p = Pattern.compile(regexToFind);
final Matcher m = p.matcher(inputWithoutTags);
while (m.find()) {
addTag(new Tag(openingTag, m.start()), true, ret);
addTag(new Tag(closingTag, m.end()), false, ret);
}
return ret;
}
private void addTag(final Tag tag, final boolean beforeIgnored, final List<Tag> allTags) {
for (int i = 0; i < allTags.size(); i++) {
if (allTags.get(i).startPos >= tag.startPos && beforeIgnored) {
allTags.add(i, tag);
return;
}
if (allTags.get(i).startPos > tag.startPos) {
allTags.add(i, tag);
return;
}
}
allTags.add(allTags.size(), tag);
}
private String combineTags(final List<Tag> allTags) {
final StringBuilder sb = new StringBuilder(inputWithoutTags);
for (int i = allTags.size() - 1; i >= 0; i--) {
final Tag tag = allTags.get(i);
sb.insert(tag.startPos, tag.text);
}
return sb.toString();
}
public static void main(String... args) {
final HtmlHighlighter highlighter = new HtmlHighlighter("This is a test text with <span>it<span>al<span>ic</span> and etc.", "\\<.*?\\>");
System.out.println(highlighter.highlightText("italic", "<span class='highlight'>", "</span>"));
}
}