我有一个简单的问题,我很难搞清楚。我想逐行阅读一个html文件,但我想跳过HEAD标签。因此,我想我可以在跳过HEAD标签后开始阅读文本。
到目前为止,我已创建:
BufferedReader reader = new BufferedReader(new InputStreamReader(socket.getInputStream()));
StringBuilder string = new StringBuilder();
String line;
while ((line = reader.readLine()) != null) {
if (line.startsWith("<html>"))
string.append(line + "\n");
}
我想在没有HEAD信息的情况下将html代码保存在内存中。
示例:
<HTML>
<HEAD>
<TITLE>Your Title Here</TITLE>
</HEAD>
<BODY BGCOLOR="FFFFFF">
<CENTER><IMG SRC="clouds.jpg" ALIGN="BOTTOM"> </CENTER>
<a href="http://somegreatsite.com">Link Name</a>is a link to another nifty site
<H1>This is a Header</H1>
<H2>This is a Medium Header</H2>
Send me mail at <a href="mailto:support@yourcompany.com">support@yourcompany.com</a>.
</BODY>
我想保存除标签信息之外的所有内容。
答案 0 :(得分:1)
这样的事情怎么样 -
boolean htmlFound = false; // Have we found an open html tag?
StringBuilder string = new StringBuilder(); // Back to your code...
String line;
while ((line = reader.readLine()) != null) {
if (!htmlFound) { // Have we found it yet?
if (line.toLowerCase().startsWith("<html")) { // Check if this line opens a html tag...
htmlFound = true; // yes? Excellent!
} else {
continue; // Skip over this line...
}
}
System.out.println("This is each line: " + line);
string.append(line + "\n");
}