Jsoup从Web浏览器返回不同的输出

时间:2016-09-18 08:21:29

标签: java json string jsoup

我要解析这个API。

  

https://data.studentedge.com.au/api/comments/getpage?page=1&sort=Oldest&url=%2Fforums%2Fdetails%2Fany-surfers-out-there

当我使用网络浏览器浏览时(启用或未启用JavaScript) 它返回:

{"Items":[{"CommentBody":"<p>I ride a 5'9 and am from the mid north coast</p>\r\n\r\n","MemberName":"Jack F","AvatarUrl":"https://studentedgeapplication.blob.core.windows.net/profiles/21450f07-ddcc-4f19-8cba-296f22e84ee1.jpeg","PostDate":"2016-09-03T01:38:38+00:00","CommentId":"f1c50066-69b3-4a92-bc0c-a676001b174f","ParentId":null,"PosterId":"28936bc3-f705-45d6-8f94-a5b0004585c6","Status":"Approved","CurrentMemberComment":false,"UpvoteCount":0,"MemberHasUpvoted":false,"PageUrl":"/forums/details/any-surfers-out-there","IsModerator":false},{"CommentBody":"<p>I surf everyday on Google Chrome - SA here ;)</p>\r\n\r\n","MemberName":"Bryan A","AvatarUrl":"https://studentedgeapplication.blob.core.windows.net/profiles/02a713ee-2ca1-4029-85f8-314878386621.png","PostDate":"2016-09-09T10:36:47+00:00","CommentId":"689460a2-4b02-4ca7-851c-a67c00aee6ab","ParentId":null,"PosterId":"5192fcf7-703b-4f78-b6fd-a3a000427119","Status":"Approved","CurrentMemberComment":false,"UpvoteCount":1,"MemberHasUpvoted":false,"PageUrl":"/forums/details/any-surfers-out-there","IsModerator":false},{"CommentBody":"<p>Same... Chrome's the only thing I surf....</p>\r\n<p>My mate goes 5'10&quot; and also snowboards...</p>\r\n\r\n","MemberName":"Sandy S","AvatarUrl":"https://studentedgeapplication.blob.core.windows.net/profiles/6542e863-3f04-496d-b4aa-d6adcb16ca39.jpg","PostDate":"2016-09-09T10:51:40+00:00","CommentId":"9479d9f2-845a-48a5-8d28-a67c00b2fcd7","ParentId":"689460a2-4b02-4ca7-851c-a67c00aee6ab","PosterId":"165dc3d0-9e3d-484f-b5be-a3a100cfc691","Status":"Approved","CurrentMemberComment":false,"UpvoteCount":0,"MemberHasUpvoted":false,"PageUrl":"/forums/details/any-surfers-out-there","IsModerator":false}],"PageNumber":1,"Order":"Oldest"}

它是完美的JSON。 但是当我使用Jsoup时,它会返回。

<html> <head></head> <body>  {"Items":[{"CommentBody":"  <p>I ride a 5'9 and am from the mid north coast</p>\r\n\r\n","MemberName":"Jack F","AvatarUrl":"https://studentedgeapplication.blob.core.windows.net/profiles/21450f07-ddcc-4f19-8cba-296f22e84ee1.jpeg","PostDate":"2016-09-03T01:38:38+00:00","CommentId":"f1c50066-69b3-4a92-bc0c-a676001b174f","ParentId":null,"PosterId":"28936bc3-f705-45d6-8f94-a5b0004585c6","Status":"Approved","CurrentMemberComment":false,"UpvoteCount":0,"MemberHasUpvoted":false,"PageUrl":"/forums/details/any-surfers-out-there","IsModerator":false},{"CommentBody":"  <p>I surf everyday on Google Chrome - SA here ;)</p>\r\n\r\n","MemberName":"Bryan A","AvatarUrl":"https://studentedgeapplication.blob.core.windows.net/profiles/02a713ee-2ca1-4029-85f8-314878386621.png","PostDate":"2016-09-09T10:36:47+00:00","CommentId":"689460a2-4b02-4ca7-851c-a67c00aee6ab","ParentId":null,"PosterId":"5192fcf7-703b-4f78-b6fd-a3a000427119","Status":"Approved","CurrentMemberComment":false,"UpvoteCount":1,"MemberHasUpvoted":false,"PageUrl":"/forums/details/any-surfers-out-there","IsModerator":false},{"CommentBody":"  <p>Same... Chrome's the only thing I surf....</p>\r\n  <p>My mate goes 5'10" and also snowboards...</p>\r\n\r\n","MemberName":"Sandy S","AvatarUrl":"https://studentedgeapplication.blob.core.windows.net/profiles/6542e863-3f04-496d-b4aa-d6adcb16ca39.jpg","PostDate":"2016-09-09T10:51:40+00:00","CommentId":"9479d9f2-845a-48a5-8d28-a67c00b2fcd7","ParentId":"689460a2-4b02-4ca7-851c-a67c00aee6ab","PosterId":"165dc3d0-9e3d-484f-b5be-a3a100cfc691","Status":"Approved","CurrentMemberComment":false,"UpvoteCount":0,"MemberHasUpvoted":false,"PageUrl":"/forums/details/any-surfers-out-there","IsModerator":false}],"PageNumber":1,"Order":"Oldest"} </body></html>

JSOUP代码:

Document doc = Jsoup.connect(baseUrl + keyword)
            .followRedirects(true)
            .ignoreContentType(true)
            .userAgent("Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:43.0) Gecko/20100101 Firefox/43.0")
            .header("Accept-Encoding", "gzip, deflate")
            .header("Accept-Language", "en-US,en;q=0.5")
            .header("Host", "data.studentedge.com.au")
            .header("Origin", "https://studentedge.com.au")
            .header("Referer", "https://studentedge.com.au/forums/details/any-surfers-out-there")
            .get();
    String result = doc.html();

注意:如果我使用doc.text(),它会以某种方式破坏json。

1 个答案:

答案 0 :(得分:3)

使用executebody获取原始数据:

    String result = Jsoup.connect(baseUrl + keyword)
            .followRedirects(true)
            .ignoreContentType(true)
            .userAgent("Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:43.0) Gecko/20100101 Firefox/43.0")
            .header("Accept-Encoding", "gzip, deflate")
            .header("Accept-Language", "en-US,en;q=0.5")
            .header("Host", "data.studentedge.com.au")
            .header("Origin", "https://studentedge.com.au")
            .header("Referer", "https://studentedge.com.au/forums/details/any-surfers-out-there")
            .execute().body();