Question

所以我有一个包含html页面源的txt文件（没有css，没有html5声明，没有js，只有html标签）。我必须输出包含至少一个结束html标记的那些行的索引。它应该使用正则表达式。我知道如何查找结束标记，但不知道如何索引它们。我的第一个想法是用新行字符＆＃34; \ n＆＃34;分割源代码。但是我必须在每一行编译匹配器。还有另一种方法吗？谢谢！

Answer 1

或使用扫描仪：

Pattern p = Pattern.compile("</[^>]+>");
Scanner s = new Scanner(new BufferedReader(new FileReader("input.txt")));

for (int lineNum=1; s.hasNext(); lineNum++) {
      Matcher m = p.matcher(s.next());
            if(m.find()){
              System.out.println(lineNum);
            }

}

Answer 2

这是一个示例，如果有结束标记，则会读取文件的每一行并输出。我使用BufferedReader逐行读取文件（如this问题中所述），然后查看哪一行包含模式。

更新1

正如上面的评论所说，你不应该使用正则表达式来解析你的文件。如果你想这样做，你可以例如使用JSoup。但是，如果您只想做您在问题中描述的内容，那么正则表达式就可以了。

package main;

import java.io.BufferedReader;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStreamReader;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class GetClosedTagsOfFile {
    public static void main(String[] args) throws IOException {
        // Open the file.
        FileInputStream fis = new FileInputStream("test.html");
        BufferedReader br = new BufferedReader(new InputStreamReader(fis));

        // Compile pattern.
        Pattern p = Pattern.compile("</[^>]+>");

        // Read the file.
        String strLine;
        int i = 0;
        while ((strLine = br.readLine()) != null)   {
            i++;

            // Check if there is a closing tag.
            Matcher m = p.matcher(strLine);
            if(m.find())
                System.out.println("Line " + i + " contains a closing tag.");
        }

        // Close the input stream.
        fis.close();
        br.close();
    }
}

Answer 3

拆分源代码是一种选择，您不必每行编译匹配器。另一个选择是找到匹配项，然后开始计算它们之间的换行符。

输出包含结束html标记的行

3 个答案: