我想从网站上删除评论。我在jsoup中的类中获取p标签时遇到了麻烦。示例html代码在
下面<html>
<head>
<title>My webpage</title>
</head>
<body>
<div class="container">
<div class="comment">
<p>This is comment</p>
</div>
</div>
</body>
</html>
这是我的java代码
public static void main(String args[]){
Document doc = null;
try {
doc = Jsoup.connect("https://homeshopping.pk/products/Amazon-Fire-Phone-%284G%2C-32GB%2C-Black%29-Price-in-Pakistan.html").get();
System.out.println("Connect successfully");
org.jsoup.select.Elements element = doc.select("div.post-message");
System.out.println(element.get(0).text());
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
答案 0 :(得分:2)
您尝试获取的页面的评论部分不是简单的HTML内容。初始页面加载后,注释将通过Javascript加载到DOM。 JSoup是一个HTML解析器,因此您无法通过Jsoup获取页面的注释。要获取此类内容,您需要一个嵌入式浏览器组件。看一下这个答案:Is there a way to embed a browser in Java?
以下代码适用于您提供的特定HTML字符串。
试试这个:
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.select.Elements;
public class Test {
public static void main(String[] arg)
{
Document doc = null;
try {
doc = Jsoup.parse("<html> "
+ "<head> "
+ "<title>My webpage</title> "
+ "</head> <body> <div class=\"container\"> "
+ "<div class=\"comment\"> "
+ "<p>This is comment</p> "
+ " </div> </div> </body></html> ");
Elements element = doc.select(".container").select(".comment");
System.out.println(element.get(0).select("p").text());
}
catch (Exception e)
{
e.printStackTrace(); }
}
}
用于连接网址:
doc = Jsoup.connect("https://homeshopping.pk/products/Amazon-Fire-Phone-%284G%2C-32GB%2C-Black%29-Price-in-Pakistan.html").timeout(60*1000).userAgent("Mozilla").get();
答案 1 :(得分:1)
要扩展Arijit的解决方案,如果有多个 while(!(itemType == "b" || itemType == "m" || itemType == "d" || itemType == "t" || itemType == "c")){
cout<<"Enter the item type-b,m,d,t,c:"<<endl;
cin>>itemType;
cout<<itemType<<endl;
}
cout<<itemType;
标记带有<div>
类,您可以尝试:
comment
如果有其他标记共享Document doc = null;
try
{
doc = Jsoup.parse("<html> " + "<head> " + "<title>My webpage</title> "
+ "</head> <body> <div class=\"container\"> " + "<div class=\"comment foo\"> "
+ "<p>This is comment</p> " + " </div> </div> </body></html> ");
Elements comments = doc.getElementsByAttributeValueMatching("class", "comment");
Iterator<Element> iter = comments.iterator();
while(iter.hasNext())
{
Element e = iter.next();
System.out.println(e.getElementsByTag("p").text());
}
}
catch (Exception e)
{
e.printStackTrace();
}
课程,您可以使用comment
检查它是e.tagName()
。
答案 2 :(得分:0)
如果你的目标是打印This is comment
,你可以尝试这样的事情:
org.jsoup.select.Elements element = doc.select("div.container").select("div.comment");
System.out.println(element.get(0).text());