Java - 如何在.html文件中查找JavaScript生成的值并将其解压缩?

时间:2016-10-14 09:45:23

标签: java html extraction

我正在开发一个Java程序,它必须从嵌入了JavaScript的HTML文档中检索几个值。我已经使用无头浏览器来保存网页的来源。

格式如下:

<script type="text/javascript">window._sharedData = {"country_code": "NL", "language_code": "en", "gatekeepers": {"cc": true, "sms": true}, "show_app_install": false, "static_root": "//instagramstatic-a.akamaihd.net/h1", "platform": "web", "hostname": "www.instagram.com", "entry_data": {"FeedPage": [{"feed": {"media": {"page_info": {"has_previous_page": false, "start_cursor": "1360768377836321975", "end_cursor": "KKsBARIAAAIgABgAEAAQAAgACAAIAAgA__79_______-______________3_____vd7_3_3___f7X_-__9f__________fv______7________f______9ke1HYuLdyUdv2FnF1__3___bn__1u___9_-___9d_W_____7_v____3__-__1f_r__b_3v_eb-_5__4___3_d_-f____X_7X77__t3HrzO1EPGYwQAFoKj3qj4VRgTMTM2MDc3MDMwMDk1MDU3ODc0NgA=", "has_next_page": true}, "nodes": [{"code": "BLia6FGhLy3", "date": 1476436253, "dimensions": {"width": 1080, "height": 1080}, "usertags": {"nodes": []}, "comments_disabled": false, "comments": {"count": 13, "page_info": {"has_previous_page": true, "start_cursor": "AQAP5jWcl57NiC9cWj78rqzBLGrlhK3cSQEitgUE9A89SCRi9mgAZR7bcVYwduyBwfEo40rs-p9UPlpv-yqs0dol-ZZN1kzswGiFDaPtau3mkyzNarvfF3lXl9YQBcX3XMg", "end_cursor": null, "has_next_page": false}, "nodes": [{"text": "OMG DE ERGSTE CLOWN VIDEOS STAAN NU OP @streetbroeders WTF DIT GAAT ECHT TE VER\ud83d\ude28\ud83d\ude28\ud83d\ude28\ud83d\ude28\ud83d\ude31\ud83d\ude28\ud83d\ude31\ud83d\ude31\ud83d\ude28\ud83d\ude31\ud83d\ude28\ud83d\ude31\ud83d\ude28\ud83d\ude28\ud83d\ude28\ud83d\ude31\ud83d\ude28\ud83d\ude31\ud83d\ude31\ud83d\ude28\ud83d\ude31\ud83d\ude28\ud83d\ude31\ud83d\ude28\ud83d\ude31\ud83d\ude28", "created_at": 1476436572.0, "id": "17854953877072546", "user": {"username": "dimitriiiiii8", "profile_pic_url": "https://scontent-ams3-1.cdninstagram.com/t51.2885-19/s150x150/14350622_962082440562323_5463667153242685440_a.jpg", "id": "3585919439"}}, {"text": "Hij vlogt nou ook \ud83d\ude02 @puckthoonen", "created_at": 1476436665.0, "id": "17854953934072546", "user": {"username": "xyosrax", "profile_pic_url": "https://scontent-ams3-1.cdninstagram.com/t51.2885-19/s150x150/14591971_643784482465188_2808893991515324416_a.jpg", "id": "421989269"}}, {"text": "@s.cllkk  omg ikga stuk baksana sende", "created_at": 1476437485.0, "id": "17854954273072546", "user": {"username": "s.avvrr", "profile_pic_url": "https://scontent-ams3-1.cdninstagram.com/t51.2885-19/s150x150/14714421_1630476263911573_6235495801653559296_a.jpg", "id": "3636925067"}}, {"text": "@Kyra.louisa awww wij staan er in niet...", "created_at": 1476437614.0, "id": "17854954330072546", "user": {"username": "lailanaitabdellah", "profile_pic_url": "https://scontent-ams3-1.cdninstagram.com/t51.2885-19/s150x150/14565128_1806486246263649_9106588324138582016_a.jpg", "id": "3921803536"}}]}, "caption": "ER STAAT EEN VLOG ONLINE !! CHECK HEMM \u2764\ufe0f\u2764\ufe0flink in bioooo en vergeet niet een blauw duimpje omhoog te doen", "likes": {"count": 863, "viewer_has_liked": false, "nodes": [{"user": {"username": "_peeettt_", "profile_pic_url": "https://scontent-ams3-1.cdninstagram.com/t51.2885-19/s150x150/14498950_1889382917962867_1213856864219955200_a.jpg", "id": "2310846728"}}, {"user": {"username": "florian_xhofleer", "profile_pic_url": "https://scontent-ams3-1.cdninstagram.com/t51.2885-19/s150x150/13129436_1053484581388440_1497220330_a.jpg", "id": "1509328689"}}, {"user": {"username": "opal.jc_", "profile_pic_url": "https://scontent-ams3-1.cdninstagram.com/t51.2885-19/s150x150/14240791_332116903802213_510814450_a.jpg", "id": "459234511"}}, {"user": {"username": "kussomraaax", "profile_pic_url": "https://scontent-ams3-1.cdninstagram.com/t51.2885-19/s150x150/14709575_153296821796627_7761089346496626688_a.jpg", "id": "701431190"}}, {"user": {"username": "samswanenberg", "profile_pic_url": "https://scontent-ams3-1.cdninstagram.com/t51.2885-19/s150x150/13687380_1177347195637613_965407768_a.jpg", "id": "1531470592"}}, {"user": {"username": "_079nicky", "profile_pic_url": "https://scontent-ams3-1.cdninstagram.com/t51.2885-19/s150x150/13658715_682148038601127_159337745_a.jpg", "id": "2958631733"}}, {"user": {"username": "xoxo_phartje", "profile_pic_url": "https://scontent-ams3-1.cdninstagram.com/t51.2885-19/s150x150/14488249_694206510731451_5145437345356447744_a.jpg", "id": "621925932"}}, {"user": {"username": "lakeishacjs", "profile_pic_url": "https://scontent-ams3-1.cdninstagram.com/t51.2885-19/s150x150/14709711_1302156743128341_5556269800650440704_a.jpg", "id": "1213887878"}}, {"user": {"username": "babsjeh_x", "profile_pic_url": "https://scontent-ams3-1.cdninstagram.com/t51.2885-19/s150x150/13636227_1564992197143404_2071275735_a.jpg", "id": "1530416154"}}, {"user": {"username": "kussmarous", "profile_pic_url": "https://scontent-ams3-1.cdninstagram.com/t51.2885-19/s150x150/14709477_320191241677767_540779637182038016_a.jpg", "id": "615374756"}}]}, "owner": {"username": "diiggys", "requested_by_viewer": false, "followed_by_viewer": true, "blocked_by_viewer": false, "profile_pic_url": "https://scontent-ams3-1.cdninstagram.com/t51.2885-19/s150x150/13696373_125957381180534_1183915640_a.jpg", "full_name": "Naik Byusa", "has_blocked_viewer": false, "id": "34423417", "is_private": false}, "is_video": false, "id": "1360768377836321975", "display_src": "https://scontent-ams3-1.cdninstagram.com/t51.2885-15/e35/14705195_316998188673752_3401968978694766592_n.jpg?ig_cache_key=MTM2MDc2ODM3NzgzNjMyMTk3NQ%3D%3D.2", "location": null}, {"code": "BLiXPH-FI4g", "date": 1476434328, "video_views": 152, "video_url": "https://scontent-ams3-1.cdninstagram.com/t50.2886-16/14677708_1608586469435162_6167663639391109120_n.mp4", "usertags": {"nodes": []}, "comments_disabled": false, "location": null, "comments": {"count": 3, "page_info": {"has_previous_page": false, "start_cursor": null, "end_cursor": null, "has_next_page": false}, "nodes": [{"text": "@frankpost15", "created_at": 1476435031.0, "id": "17853388039128939", "user": {"username": "willemvisscher_", "profile_pic_url": "https://scontent-ams3-1.cdninstagram.com/t51.2885-19/s150x150/13671145_282174172153406_1761302049_a.jpg", "id": "1298139695"}}, {"text": "@sharxnnn @jl_dgn", "created_at": 1476435731.0, "id": "17853388315128939", "user": {"username": "jbkri", "profile_pic_url": "https://scontent-ams3-1.cdninstagram.com/t51.2885-19/s150x150/14368988_1710149442643660_1034759283482296320_a.jpg", "id": "2062518552"}}, {"text": "@patrickkpm", "created_at": 1476436806.0, "id": "17853388807128939", "user": {"username": "bryanassendelft", "profile_pic_url": "https://scontent-ams3-1.cdninstagram.com/t51.2885-19/s150x150/14564857_1198678756861436_988051166245945344_a.jpg", "id": "297101893"}}]}, "caption": "Frankentek - Hakkeuh \ud83d\udd0a\ud83d\udd0a #Terror \u25cb\u25cf\u25cb https://m.soundcloud.com/teknogaatdoor/10-frankentek-hakkeuh", "likes": {"count": 42, "viewer_has_liked": false, "nodes": [{"user": {"username": "terror_donny", "profile_pic_url": "https://scontent-ams3-1.cdninstagram.com/t51.2885-19/s150x150/13725767_268201096898595_726393479_a.jpg", "id": "835322770"}}, {"user": {"username": "baaaaaard", "profile_pic_url": "https://scontent-ams3-1.cdninstagram.com/t51.2885-19/s150x150/12797680_1509686022674162_1688584928_a.jpg", "id": "2333839523"}}, {"user": {"username": "yaizamonllau", "profile_pic_url": "https://scontent-ams3-1.cdninstagram.com/t51.2885-19/s150x150/14499080_351361555200467_8895706074150076416_a.jpg", "id": "1377103906"}}, {"user": {"username": "michelamishka", "profile_pic_url": "https://scontent-ams3-1.cdninstagram.com/t51.2885-19/s150x150/14276662_1805780383002601_313546244_a.jpg", "id": "1590152764"}}, {"user": {"username": "ste.angler", "profile_pic_url": "https://scontent-ams3-1.cdninstagram.com/t51.2885-19/s150x150/14033049_623968371105122_1595496671_a.jpg", "id": "1644435188"}}, {"user": {"username": "dr.mdma", "profile_pic_url": "https://scontent-ams3-1.cdninstagram.com/t51.2885-19/s150x150/14350713_1621777218114615_2737118765807304704_a.jpg", "id": "2009841461"}}, {"user": {"username": "_.mottoos", "profile_pic_url": "https://scontent-ams3-1.cdninstagram.com/t51.2885-19/s150x150/14449303_324014794620313_7098976999166705664_n.jpg", "id": "3901868365"}}, {"user": {"username": "kantholz_xo", "profile_pic_url": "https://scontent-ams3-1.cdninstagram.com/t51.2885-19/s150x150/13767539_714935758645904_503495195_a.jpg", "id": "910174488"}}, {"user": {"username": "_marilena_6981", "profile_pic_url": "https://scontent-ams3-1.cdninstagram.com/t51.2885-19/s150x150/14482040_784714728336686_6798367457821065216_a.jpg", "id": "429006425"}}, {"user": {"username": "gabber_4life", "profile_pic_url": "https://scontent-ams3-1.cdninstagram.com/t51.2885-19/s150x150/14474384_1171060102973537_2717945524991819776_a.jpg", "id": "1920420455"}}]}, "owner": {"username": "terror.only", "requested_by_viewer": false, "followed_by_viewer": true, "blocked_by_viewer": false, "profile_pic_url": "https://scontent-ams3-1.cdninstagram.com/t51.2885-19/s150x150/12346215_1190627637631297_834405984_a.jpg", "full_name": "\ud83d\udc80 TERROR TRACKS ONLY \ud83d\udc80", "has_blocked_viewer": false, "id": "2021979402", "is_private": false}, "is_video": true, "id": "1360752231838944800", "display_src": "https://scontent-ams3-1.cdninstagram.com/t51.2885-15/s640x640/e15/14736373_1597466177222408_3927176009354313728_n.jpg?ig_cache_key=MTM2MDc1MjIzMTgzODk0NDgwMA%3D%3D.2", "dimensions": {"width": 640, "height": 640}}, {"code": "BLiXgVqFAVc", "date": 1476434469, "video_views": 120, "video_url": "https://scontent-ams3-1.cdninstagram.com/t50.2886-16/14704390_526737934192376_9131678466519859200_n.mp4", "usertags": {"nodes": []}, "comments_disabled": false, "location": null, "comments": {"count": 0, "page_info": {"has_previous_page": false, "start_cursor": null, "end_cursor": null, "has_next_page": false}, "nodes": []}, "caption": "M1dlet - I'm a Killa (Forthcoming Satan Inside) \ud83d\udd0a\ud83d\ude08\ud83d\udc4c #Early #Terror \u25cb\u25cf\u25cb https://m.soundcloud.com/m1dlet/m1dlet-im-a-killa", "likes": {"count": 34, "viewer_has_liked": false, "nodes": [{"user": {"username": "ruben_v_t", "profile_pic_url": "https://scontent-ams3-1.cdninstagram.com/t51.2885-19/s150x150/14134704_1064829600271198_444689682_a.jpg", "id": "2713722767"}}, {"user": {"username": "ronaldvandelaar_", "profile_pic_url": "https://scontent-ams3-1.cdninstagram.com/t51.2885-19/s150x150/13561599_1732495487030866_1552760297_a.jpg", "id": "1723429410"}}, {"user": {"username": "hardmaxi", "profile_pic_url": "https://scontent-ams3-1.cdninstagram.com/t51.2885-19/s150x150/14499095_1172144672831249_1164011947683741696_n.jpg", "id": "1241658092"}}, {"user": {"username": "chacharka96", "profile_pic_url": "https://scontent-ams3-1.cdninstagram.com/t51.2885-19/s150x150/14276627_752298758250722_677084893_a.jpg", "id": "1792680811"}}, {"user": {"username": "rohan__013", "profile_pic_url": "https://scontent-ams3-1.cdninstagram.com/t51.2885-19/s150x150/13269534_879859075470140_550284467_a.jpg", "id": "622302799"}}, {"user": {"username": "natasroska", "profile_pic_url": "https://scontent-ams3-1.cdninstagram.com/t51.2885-19/s150x150/14474443_172635103181510_839346852858953728_a.jpg", "id": "1560288211"}}, {"user": {"username": "bih_dali_aut", "profile_pic_url": "https://scontent-ams3-1.cdninstagram.com/t51.2885-19/s150x150/13402245_1405506119748675_93650728_a.jpg", "id": "2354939415"}}, {"user": {"username": "paul_rimbaud", "profile_pic_url": "https://scontent-ams3-1.cdninstagram.com/t51.2885-19/s150x150/13703163_316224315390857_1107591926_a.jpg", "id": "329737397"}}, {"user": {"username": "michelamishka", "profile_pic_url": "https://scontent-ams3-1.cdninstagram.com/t51.2885-19/s150x150/14276662_1805780383002601_313546244_a.jpg", "id": "1590152764"}}, {"user": {"username": "x_blvckout_x", "profile_pic_url": "https://scontent-ams3-1.cdninstagram.com/t51.2885-19/s150x150/14359449_1752725171635393_739947707468087296_a.jpg", "id": 

我想要的是提取每个“代码”值,以便我可以使用它来浏览照片的页面。

FIXED:

以下是解决方案:     com.jaunt.Elements scripts = userAgent.doc.findEach(“body”)。findEach(“script”);

                String JSon = scripts.getElement(2).innerHTML(); 

                userAgent.openJSON(JSon);
                JNode codes = userAgent.json.findEvery("code");
                System.out.println("codes: " + codes);

0 个答案:

没有答案