我怎样才能从网站上获取所有元素

时间:2014-03-08 06:24:39

标签: java jsoup

我正在尝试编写代码以获取网站网址中的所有元素。但是,在输出中我只得到最后一个元素值和属性值,代码如下所示

public static void getHTMLElements(List<String> urls) throws IOException {
    getElements(urls);
    for (Map.Entry<String, HtmlElements> entry1 : urlList.entrySet()) {
        HtmlElements htmlele = entry1.getValue();
        System.out.println("url is " + entry1.getKey());
        System.out.println("Element Name is " + htmlele.getElementName());
        System.out.println("Attributes are " + htmlele.getAttributes());
    }
}

public static void getElements(List<String> urls) throws IOException {
    try {
        for (int i = 0; i < urls.size(); i++) {
            String s = urls.get(i);
            Document doc = Jsoup.connect(s).get();
            getInputElements(doc, s);
        }
    }
    catch (Exception e) {
        e.printStackTrace();
    }
}

public static void getInputElements(Document doc, String urls) {
    // List l=new ArrayList();
    HtmlElements htmlElements = new HtmlElements();
    Properties attributes = new Properties();
    Elements elements = doc.getAllElements();
    for (Element element : elements) {
        if (!element.tagName().contains("script")) {
            String elementName = element.tagName();
            Attributes attr = element.attributes();
            for (Attribute attr1 : attr) {
                if (attr1.getKey().contains("id")) {
                    attributes.put(attr1.getKey(), attr1.getValue());
                }
                if (attr1.getKey().contains("name")) {
                    attributes.put(attr1.getKey(), attr1.getValue());
                }
                if (attr1.getKey().contains("type")) {
                    attributes.put(attr1.getKey(), attr1.getValue());
                }
            }
            htmlElements.setElementName(elementName);
            htmlElements.setAttributes(attributes);
        }
        urlList.put(urls, htmlElements);
    }
}

我只获取最后一个元素值和最后一个元素属性。值被覆盖,我只得到最后的值。我想获得所有的元素和属性。

请帮帮我

1 个答案:

答案 0 :(得分:0)

我对Jsoup并不是特别熟悉,但乍一看......似乎htmlElementsattributes应该在循环中初始化。

    public static void getInputElements(Document doc, String urls) {
        Elements elements = doc.getAllElements();
        for (Element element : elements) {
            HtmlElements htmlElements = new HtmlElements();
            Properties attributes = new Properties();
            // ...