我正在尝试使用JSoup在链接Moto X上提取产品的评论,但它正在抛出NullPointerException。此外,我想要点击"阅读更多"之后显示的文字。审查的链接。
import java.io.*;
import org.jsoup.*;
import org.jsoup.nodes.*;
import org.jsoup.select.*;
public class JSoupEx
{
public static void main(String[] args) throws IOException
{
Document doc = Jsoup.connect("https://www.flipkart.com/moto-x-play-with-turbo-charger-white-16-gb/product-reviews/itmefzwvdejejvth?pid=MOBEFM5HAFRNSJJA").get();
Element ele = doc.select("div[class=qwjRop] > div").first();
System.out.println(ele.text());
}
}
任何解决方案?
答案 0 :(得分:1)
JSoup只能解析HTML,而不能运行JavaScript,但是你要查找的内容会被JavaScript添加到页面中,而Jsoup并不知道。
你需要像硒这样的东西来获得你想要的东西,但是对于你想要解析的这个特定网站,快速分析它的内容。网络活动告诉您所寻找的所有内容都是通过API调用从后端获取的,您可以使用这些内容,并且在不使用Jsoup的情况下使内容更易于访问。
答案 1 :(得分:1)
正如Gherkin建议的那样,使用开发人员工具中的网络选项卡,我们会看到一个请求,以接收评论(采用JSON格式)作为回复:
https://www.flipkart.com/api/3/product/reviews?productId=MOBEFM5HAFRNSJJA&count=15&ratings=ALL&reviewerType=ALL&sortOrder=MOST_HELPFUL&start=0
使用像JSON.simple这样的JSON解析器,我们可以提取评论作者,实用性和文本等信息。
示例代码
String userAgent = "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.143 Safari/537.36";
String reviewApiCall = "https://www.flipkart.com/api/3/product/reviews?productId=MOBEFM5HAFRNSJJA&count=15&ratings=ALL&reviewerType=ALL&sortOrder=MOST_HELPFUL&start=";
String xUserAgent = userAgent + " FKUA/website/41/website/Desktop";
String referer = "https://www.flipkart.com/moto-x-play-with-turbo-charger-white-16-gb/product-reviews/itmefzwvdejejvth?pid=MOBEFM5HAFRNSJJA";
String host = "www.flipkart.com";
int numberOfPages = 2; // first two pages of results will be fetched
try {
// loop for multiple review pages
for (int i = 0; i < numberOfPages; i++) {
// query reviews
Response response = Jsoup.connect(reviewApiCall+(i*15)).userAgent(userAgent).referrer(referer).timeout(5000)
.header("x-user-agent", xUserAgent).header("host", host).ignoreContentType(true).execute();
System.out.println("Response in JSON format:\n\t" + response.body() + "\n");
// parse json response
JSONObject jsonObject = (JSONObject) new JSONParser().parse(response.body().toString());
jsonObject = (JSONObject) jsonObject.get("RESPONSE");
JSONArray jsonArray = (JSONArray) jsonObject.get("data");
for (Object object : jsonArray) {
jsonObject = (JSONObject) object;
jsonObject = (JSONObject) jsonObject.get("value");
System.out.println("Author: " + jsonObject.get("author") + "\thelpful: "
+ jsonObject.get("helpfulCount") + "\n\t"
+ jsonObject.get("text").toString().replace("\n", "\n\t") + "\n");
}
}
} catch (Exception e) {
e.printStackTrace();
}
<强>输出强>
Response in JSON format:
{"CACHE_INVALIDATION_TTL":"132568825671","REQUEST":null,"REQUEST-ID": [...] }
Author: Flipkart Customer helpful: 140
A great phone at an affordable price with
-an outstanding camera
-great battery life
-an excellent display
-premium looks
the flipkart delivery was also fast and perfect.
Author: Vaibhav Yadav helpful: 518
I m writing this review after using 2 months..
First of all ..I must say this is one of the best product ..camera quality is best in natural lights or daytime..but in low light and in the night..camera quality is not so good but it's ok..
It has good battery backup ..last one day on 3g usage ..while using 4g ..it lasts for about 10-12 hour..
Turbo charges is good..although ..my charger is not working..
Only problem in this phone is ..while charging..this phone heats a lot..this may b becoz of turbo charger..if u r using other charger than it does not heat..
Author: KAPIL CHOPRA helpful: 9
[...]
注意:输出截断([...])