我正在尝试编写代码以获取网站网址中的所有元素。但是,在输出中我只得到最后一个元素值和属性值,代码如下所示
public static void getHTMLElements(List<String> urls) throws IOException {
getElements(urls);
for (Map.Entry<String, HtmlElements> entry1 : urlList.entrySet()) {
HtmlElements htmlele = entry1.getValue();
System.out.println("url is " + entry1.getKey());
System.out.println("Element Name is " + htmlele.getElementName());
System.out.println("Attributes are " + htmlele.getAttributes());
}
}
public static void getElements(List<String> urls) throws IOException {
try {
for (int i = 0; i < urls.size(); i++) {
String s = urls.get(i);
Document doc = Jsoup.connect(s).get();
getInputElements(doc, s);
}
}
catch (Exception e) {
e.printStackTrace();
}
}
public static void getInputElements(Document doc, String urls) {
// List l=new ArrayList();
HtmlElements htmlElements = new HtmlElements();
Properties attributes = new Properties();
Elements elements = doc.getAllElements();
for (Element element : elements) {
if (!element.tagName().contains("script")) {
String elementName = element.tagName();
Attributes attr = element.attributes();
for (Attribute attr1 : attr) {
if (attr1.getKey().contains("id")) {
attributes.put(attr1.getKey(), attr1.getValue());
}
if (attr1.getKey().contains("name")) {
attributes.put(attr1.getKey(), attr1.getValue());
}
if (attr1.getKey().contains("type")) {
attributes.put(attr1.getKey(), attr1.getValue());
}
}
htmlElements.setElementName(elementName);
htmlElements.setAttributes(attributes);
}
urlList.put(urls, htmlElements);
}
}
我只获取最后一个元素值和最后一个元素属性。值被覆盖,我只得到最后的值。我想获得所有的元素和属性。
请帮帮我
答案 0 :(得分:0)
我对Jsoup并不是特别熟悉,但乍一看......似乎htmlElements
和attributes
应该在循环中初始化。
public static void getInputElements(Document doc, String urls) {
Elements elements = doc.getAllElements();
for (Element element : elements) {
HtmlElements htmlElements = new HtmlElements();
Properties attributes = new Properties();
// ...