我必须在html正文部分下面解析为下面给出的输出。
标签必须在输出中。输出可以有{p,i,b,br}标签。剩余的标签必须删除,只有文本必须输出。
这是我的意见。
<!DOCTYPE HTML>
<html>
<head>
<title>Introduction</title>
</head>
<body>
<article id="mobi_content">
<h1 class="mobi-page-title">Introduction</h1>
<section id="dataSectionInstanceId-431331" class="body-text">This book is about creating a great career. <p>You might be saying to yourself, "I don't want to talk about a career, much less a great career. Right now I just need a job. I need to eat!" <p>Well, if you're looking, we're going to show you how to get that great job now. That's the first, short-term step. <p>But the day will come when you'll want to do more than just eat. And beyond that day will come another day when you look back at your life and take measure of your entire professional contribution to the world. <p>This book is about today and tomorrow. It's about getting a great job now and enjoying a great career for life. <p>When we say a person has had a great career, what do we mean? That he or she made a lot of money? moved spectacularly up the corporate ladder? became famous or renowned in his or her profession? What about the familiar comment from every movie star on every talk show: "I can't believe I get paid for doing this!" Are only a few people entitled to feel that way, but not the rest of us? <p>And what about you? Are you looking forward to a great career? Would you describe your current career as "great"? When you get to the end of your productive life, will you be looking back on a mediocre career? a good career? a great career? And how will you know? <p>Furthermore, just how do you create a great career for yourself? <p>As coauthors of this book, we are fascinated by these provocative questions. We have been associated in our work for many years as avid students of what it takes to build a great life and career. And we bring two different sets of experiences to the issue, so occasionally, we will speak to you directly in our own voices. We'll share with you our discoveries and provide tools and insights that will help you find answers for yourself. Whether you're looking for a job or want to make the job you have more meaningful, this book is for you.
</section>
</article>
</body>
</html>
输出期望如下:
This book is about creating a great career.
<P>You might be saying to yourself, "I don't want to talk about a career, much less a great career. Right now I just need a job. I need to eat!"
<P>Well, if you're looking, we're going to show you how to get that great job now. That's the first, short-term step.
<P>But the day will come when you'll want to do more than just eat. And beyond that day will come another day when you look back at your life and take measure of your entire professional contribution to the world.
<P>This book is about today and tomorrow. It's about getting a great job now and enjoying a great career for life.
<P>When we say a person has had a great career, what do we mean? That he or she made a lot of money? moved spectacularly up the corporate ladder? became famous or renowned in his or her profession? What about the familiar comment from every movie star on every talk show: "I can't believe I get paid for doing this!" Are only a few people entitled to feel that way, but not the rest of us?
<P>And what about you? Are you looking forward to a great career? Would you describe your current career as "great"? When you get to the end of your productive life, will you be looking back on a mediocre career? a good career? a great career? And how will you know?
<P>Furthermore, just how do you create a great career for yourself?
<P>As coauthors of this book, we are fascinated by these provocative questions. We have been associated in our work for many years as avid students of what it takes to build a great life and career. And we bring two different sets of experiences to the issue, so occasionally, we will speak to you directly in our own voices. We'll share with you our discoveries and provide tools and insights that will help you find answers for yourself. Whether you're looking for a job or want to make the job you have more meaningful, this book is for you.
我的代码:
doc.body().traverse(new NodeVisitor() {
@Override
public void head(Node node, int depth) {
String name = node.nodeName();
String paraText = "";
if (node instanceof TextNode) {
TextNode tn = ((TextNode) node);
if (node.nodeName().equals("p")) {
//finalHtml+="<p>"+tn.text()+"</p>";
} else {
finalHtml += tn.text();
}
} else if (node instanceof Node) {
if (node.nodeName() == "p") {
System.out.println("fnbdnv"+node.toString());
}
if (node.nodeName() == "h1") {
// finalHtml+="<p>"+node.toString()+"<p>";
} else if (node.nodeName() == "div") {
node.removeAttr("class");
finalHtml += node.toString();
} else if (node.nodeName() == "seection") {
finalHtml += node.toString();
} else if (node.nodeName() == "<b>") {
finalHtml += node.toString();
} else if (node.nodeName() == "<i>") {
finalHtml += "<i>" + node.toString() + "</i>";
}
}
}
@Override
public void tail(Node node, int depth) {
// Do Nothing
}
});
答案 0 :(得分:0)
在这种情况下,也许一些正则表达式会更好。
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
public class Main {
public static void main(String[] args) {
try {
String html = "<!DOCTYPE HTML>" +
"<html>" +
"<head>" +
"<title>Introduction</title>" +
"</head>" +
"<body>" +
"<article id=\"mobi_content\">" +
"<h1 class=\"mobi-page-title\">Introduction</h1>" +
"<section id=\"dataSectionInstanceId-431331\" class=\"body-text\">This <i>book</i> is about creating a great career. <p>You might be saying to yourself, \"I don't want to talk about a career, much less a great career. Right now I just need a job. I need to eat!\" <p>Well, if you're looking, we're going to show you how to get that great job now. That's the first, short-term step. <p>But the day will come when you'll want to do more than just eat. And beyond that day will come another day when you look back at your life and take measure of your entire professional contribution to the world. <p>This book is about today and tomorrow. It's about getting a great job now and enjoying a great career for life. <p>When we say a person has had a great career, what do we mean? That he or she made a lot of money? moved spectacularly up the corporate ladder? became famous or renowned in his or her profession? What about the familiar comment from every movie star on every talk show: \"I can't believe I get paid for doing this!\" Are only a few people entitled to feel that way, but not the rest of us? <p>And what about you? Are you looking forward to a great career? Would you describe your current career as \"great\"? When you get to the end of your productive life, will you be looking back on a mediocre career? a good career? a great career? And how will you know? <p>Furthermore, just how do you create a great career for yourself? <p>As coauthors of this book, we are fascinated by these provocative questions. We have been associated in our work for many years as avid students of what it takes to build a great life and career. And we bring two different sets of experiences to the issue, so occasionally, we will speak to you directly in our own voices. We'll share with you our discoveries and provide tools and insights that will help you find answers for yourself. Whether you're looking for a job or want to make the job you have more meaningful, this book is for you." +
"</section>" +
"</article>" +
"</body>" +
"</html>";
Document doc = Jsoup.parse(html);
System.out.println(removeTags(doc.body().toString()));
} catch (Exception e) {
e.printStackTrace();
}
}
public static String removeTags(String source) {
return source.replaceAll("(?!(</?p>|</?i>|</?b>|<br/?>))(</?.*?>)", " ");
}
}
<强>更新强>
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
public class Main {
public static void main(String[] args) {
try {
String html = "<!DOCTYPE HTML>" +
"<html>" +
"<head>" +
"<title>Introduction</title>" +
"</head>" +
"<body> <article id=\"mobi_content\"> <h1 class=\"mobi-page-title\">\"Build Your Village\" Tool</h1> <section id=\"dataSectionInstanceId-431408\" class=\"body-text\"><p class=\"nonindent\">Your great career depends not only on you,</p> <p class=\"nonindent\">Sample deposits in the Emotional Bank Account:</p> <ul class=\"bullet\"> <li><p class=\"nonindent\">Congratulate the person on a job well done.</p></li> <li><p class=\"nonindent\">Send birthday greetings.</p></li></section></article></body>" +
"</html>";
Document doc = Jsoup.parse(html);
System.out.println(removeTags(doc.body().toString()));
} catch (Exception e) {
e.printStackTrace();
}
}
public static String removeTags(String source) {
return source.replaceAll("(?!(</p>|<p .*?>|</?i>|</?b>|<br/?>))(</?.*?>)", " ");
}
}
更新2
import java.util.ListIterator;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Attribute;
import org.jsoup.nodes.Attributes;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
public class Main {
public static void main(String[] args) {
try {
Pattern pattern = Pattern.compile("/(((?!/).)*)[.]");
String html = "<!DOCTYPE HTML>" +
"<html>" +
"<head>" +
"<title>Introduction</title>" +
"</head>" +
"<body> <article id=\"mobi_content\"> <h1 class=\"mobi-page-title\">\"Build Your Village\" Tool</h1> <section id=\"dataSectionInstanceId-431408\" class=\"body-text\"><p class=\"nonindent\">Your great career depends not only on you,</p> <p class=\"center\"><img src=\"mpla/multimedia/Cove_9781936111107_epub_005_r1.png\" id=\"mobi_image_12776\" class=\"inline-img\" alt=\"PNG\"/></p><p class=\"nonindent\">Sample deposits in the Emotional Bank Account:</p> <ul class=\"bullet\"> <li><p class=\"nonindent\">Congratulate the person on a job well done.</p></li> <li><p class=\"nonindent\">Send birthday greetings.</p></li></section></article></body>" +
"</html>";
Document doc = Jsoup.parse(html);
Elements imgs = doc.select("img");
System.out.println(imgs);
ListIterator<Element> iter = imgs.listIterator();
while(iter.hasNext()) {
Element img = iter.next();
String src = img.attr("src");
Matcher matcher = pattern.matcher(src);
if (matcher.find()) {
img.tagName("graphic").text(matcher.group(1));
removeAttr(img);
}
}
System.out.println(removeTags(doc.body().toString()));
} catch (Exception e) {
e.printStackTrace();
}
}
public static void removeAttr(Element e) {
Attributes at = e.attributes();
for (Attribute a : at) {
e.removeAttr(a.getKey());
}
}
public static String removeTags(String source) {
return source.replaceAll("(?!(</p>|<p .*?>|</?graphic>|</?i>|</?b>|<br/?>))(</?.*?>)", " ").trim();
}
}