Question

所以我想从我的java代码正在读取的文件中替换所有"。就这样所有“被删除，我可以写出我想要的信息。例如文件包含：

<span class="positive">This is the text i want</span>

如何删除"postive"？

这是我的代码：

public static void writeTXT(String j) throws IOException {

    j = j.replaceAll(">", "");
    j = j.replaceAll("<", "");
    for (int i = 0;i < REPLACE.length;i++) {
        j = j.replace(REPLACE[i], "");
    }


public final static String[] REPLACE = {
    "onth Change <span class=\"stay\">",
    "/span/li"
};

Answer 1

你想要做的是用正则表达式解析HTML。只有Chuck Norris可以使用正则表达式解析HTML。

如果你想获得这个子字符串，你需要编写自己的解析器来逐个字母地分析字符串，或者使用现有的解析器来解析HTML。

Answer 2

虽然您确实应该使用XML解析器从HTML中提取文本，但以下代码将删除双引号之间的所有内容

    String html = "<span class=\"positive\">This is the text i want</span>";
    System.out.println( html.replaceAll("\"[^\"]*\"", "\"\"" ));
    // <span class="">This is the text i want</span>

如何更换＆＃34; ＆＃34; ＆＃34;用＆＃34;＆＃34;

2 个答案: