重建regexp以获取json

时间:2016-07-13 20:05:47

标签: java json

我想下载网页源代码并获取json。

Here您可以切换到源代码,使用ctrl + F并找到var data这就是我需要的。 还有我的代码:

public class Parser {

    static Pattern DATA_PATTERN = Pattern.compile("var data = (.*)");

    public static void main(String[] args) throws IOException {

        String webPage = new Parser().getUrlSource("http://satiksme.daugavpils.lv/tramvajs-nr-1-butlerova-iela-stacija");
        if(webPage != null){
            Matcher m = DATA_PATTERN.matcher(webPage);
            if(m.find()) {
                String extracted = m.group(1).trim();
                System.out.println(extracted);
            }
        }
    }

    public String getUrlSource(String url) throws IOException {
        URL yahoo = new URL(url);
        URLConnection yc = yahoo.openConnection();
        BufferedReader in = new BufferedReader(new InputStreamReader(
                yc.getInputStream(), "UTF-8"));
        String inputLine;
        StringBuilder a = new StringBuilder();
        while ((inputLine = in.readLine()) != null)
            a.append(inputLine);
        in.close();

        return a.toString();
    }
}

问题是:Pattern.compile("var data = (.*)")效果不佳。我想只有json,没有额外的html标签。

现在实际结果是:

json +

$(document).ready(function () {        $(".sations ul").html("");        var selst = window.location.hash.replace("#", "");                $.each(data.stations, function (index, val) {            var cls = "even"; if (index % 2 == 0) cls = "odd";            $(".sations ul").append("<li class='" + cls + "' id='station-" + val.sid + "' onclick='return showStation(" + val.sid + ")'><span class='station-name'>" + val.name + "</span></li>");            if (index == 0) {                if (!selst)                    selst = val.sid;                                }        });        showStation(selst);        initmap(defaultLat, defaultLng, defaultZoom);    });</script></article></div>        </div>            </div></div></div><div id="layout-footer" class="group">    <footer id="footer">        <div id="footer-quad" class="group">                                            </div>        <div id="footer-sig" class="group">            <div class="zone zone-footer"><div class="credits"><span class="copyright">Copyright &#169; 2014 <b>SIA Daugavpils Satiksme</b>. All rightd reserved.</span><span class="poweredby">Izstrādāts <a href="http://www.latinsoft.lv" target="_blank">Latinsoft</a>. Izmantojot <a href="http://www.orchardproject.net" rel="nofollow" target="_blank">Orchard</a>.</span></div><div class="user-display">        <span class="user-actions"><a href="/Users/Account/LogOn?ReturnUrl=%2Ftramvajs-nr-1-butlerova-iela-stacija" rel="nofollow">Sign In</a></span></div></div>        </div>    </footer></div></div><script src="/Modules/Traffic/scripts/leaflet.js" type="text/javascript"></script><script src="/Modules/Traffic/scripts/dsapi.js" type="text/javascript"></script><script src="http://code.jquery.com/jquery-migrate-1.2.1.js" type="text/javascript"></script><script src="/Themes/TheThemeMachine/scripts/lispage.js" type="text/javascript"></script><script src="/Themes/TheThemeMachine/scripts/jquery.nivo.slider.js" type="text/javascript"></script></body></html>

预期结果:只有json。

P.S。这个Pattern在Android中非常完美。也许有人可以解释我为什么?

谢谢!

0 个答案:

没有答案