如何从html中提取多个值到java?

时间:2015-06-22 12:48:28

标签: java html jsoup

我试图从html源代码中提取一些数据到我的java项目中。 html取自&#34; Bing搜索图片&#34;我想从<a>标签中获取所有图像。这是html代码:

<a href="/images/search?q=nba&amp;view=detailv2&amp;&amp;&amp;
id=FE19E7BB2916CE8B6CD78148F3BC0656D151049A&amp;
selectedIndex=3&amp;
ccid=2%2f7OBkGc&amp;
simid=608035681734625885&amp;
thid=JN.tdPCsRj4HyJzbwA%2bgXsS8g" 
ihk="JN.tdPCsRj4HyJzbwA+gXsS8g" 
m="{ns:&quot;images&quot;,k:&quot;5070&quot;,dirovr:&quot;ltr&quot;,
mid:&quot;FE19E7BB2916CE8B6CD78148F3BC0656D151049A&quot;,
surl:&quot;http://www.nba.com/gallery/rookie/070727_1.html&quot;,
imgurl:&quot;http://www.nba.com/media/draft_class_3_07_070727.jpg
&quot;,
ow:&quot;300&quot;,docid:&quot;608035681734625885&quot;,oh:&quot;192&quot;,tft:&quot;58&quot;}" 
mid="FE19E7BB2916CE8B6CD78148F3BC0656D151049A" 
t1="The 2007 NBA Draft Class" 
t2="625 x 400 · 374 kB · jpeg" 
t3="www.nba.com/gallery/rookie/070727_1.html" 
h="ID=images,5070.1"><img data-bm="16" 
src="https://tse3.mm.bing.net/th?id=JN.tdPCsRj4HyJzbwA%2bgXsS8g&amp;w=217&amp;h=142&amp;c=7&amp;rs=1&amp;qlt=90&amp;o=4&amp;pid=1.1" 
style="width:217px;height:142px;" width="217" height="142">
</a>

这就是我试图提取它但没有成功的方法:

public static void main(String[] args) {

        String title = "dog";
        String url =    "https://www.bing.com/images/search?q="+title+"&FORM=HDRSC2";
        try {
            Document doc = Jsoup.connect(url).get();
            Elements img = doc.getElementsByTag("a");

            for (Element el : img) {
                String src1 = el.absUrl("imgurl");
                String src2 = el.absUrl("surl");
                System.out.println(src1 + " " + src2);      
            }
        } catch (IOException e) {
            e.printStackTrace();
        }

    }

有任何想法是否可能?

1 个答案:

答案 0 :(得分:1)

据我所知,您的<a>元素具有属性m,而不是imgurlsurl,而m包含的JSON又包含imgurlsurl。所以你应该从m中提取JSON:

String m = el.attr("m");

然后使用您喜欢的任何库将m解析为JSON,例如: GSON

class MJson {
    private String imgurl;
    private String surl;

    ...
}

MJson mJson = new Gson().fromJson(m, MJson.class);
String src1 = mJson.getImgurl();
String src2 = mJson.getSurl();