Question

摘要：我的代码转到了craigslist广告网址。它会在广告正文中提取隐藏的电话号码。代码适用于许多网址，除了我在代码中包含的网址。（顺便说一下，你可以复制并运行我的代码，而无需编写任何额外的代码。）

问题： getAttribute("href")仅为此网址返回null。为什么？我该如何解决这个问题？

代码：

import org.openqa.selenium.By;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.WebElement;
import org.openqa.selenium.chrome.ChromeDriver;

import java.util.ArrayList;
import java.util.List;

public class Temp {
    private static final WebDriver browser = new ChromeDriver();
    private static WebDriver temp_browser = new ChromeDriver();

    /*The code fails only for this url.*/
    private static String url = "https://sfbay.craigslist.org/pen/apa/5764613878.html";

    public static String phone_btns_xpath = "//section[@id='postingbody']//*[contains(.,'show contact info')]";
    public static By phone_btns_loc = By.xpath(phone_btns_xpath);

    public static void main(String[] args) {
        browser.get(url);
        List<String> phones = reveal_hidden_phone_numbers(temp_browser);
        temp_browser.close();
        System.out.println(phones);
    }

    public static List<String> reveal_hidden_phone_numbers(WebDriver temp_browser) {
        List<WebElement> phone_btns = browser.findElements(phone_btns_loc);
        List<String> phones = null;
        String text = null;

        if (phone_btns.size() > 0) {
            WebElement phone_btn_0 = phone_btns.get(0);
            System.out.println(phone_btn_0.getAttribute("innerHTML"));

            String url = phone_btn_0.getAttribute("href");
            temp_browser.get(url);
            text = temp_browser.findElement(By.tagName("body")).getText();

            for (WebElement phone_btn : phone_btns) {
                phone_btn.click();
            }

            phones = extract_phone_numbers(text);
        }
        return phones;
    }

    public static List<String> extract_phone_numbers(String text) {
        List<String> output = new ArrayList<String>();
        output.add("PHONE ;)");
        return output;
    }

}

堆栈跟踪：

 <a href="/fb/sfo/apa/5764613878" class="showcontact" title="click to show contact info" rel="nofollow">show contact info</a>

Exception in thread "main" java.lang.NullPointerException: null value in entry: url=null
    at com.google.common.collect.CollectPreconditions.checkEntryNotNull(CollectPreconditions.java:33)
    at com.google.common.collect.SingletonImmutableBiMap.<init>(SingletonImmutableBiMap.java:39)
    at com.google.common.collect.ImmutableBiMap.of(ImmutableBiMap.java:49)
    at com.google.common.collect.ImmutableMap.of(ImmutableMap.java:70)
    at org.openqa.selenium.remote.RemoteWebDriver.get(RemoteWebDriver.java:316)
    at com.craigslist.Temp.reveal_hidden_phone_numbers(Temp.java:38)
    at com.craigslist.Temp.main(Temp.java:23)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:483)
    at com.intellij.rt.execution.application.AppMain.main(AppMain.java:144)

Answer 1

正如我在您提供的堆栈跟踪中看到的那样，您的代码中的这一行System.out.println(phone_btn_0.getAttribute("innerHTML"));将phone_btn_0元素的内部HTML打印为： -

<a href="/fb/sfo/apa/5764613878" class="showcontact" title="click to show contact info" rel="nofollow">show contact info</a>

这意味着你试图在错误的元素上获得href属性。它位于父元素而不是href属性不存在的实际链接元素上，这就是您获得null的原因。

假设您希望从此打印的链接元素href获取HTML属性值，那么您应该尝试将href的子元素上的phone_btn_0属性值设为下面： -

WebElement phone_btn_0 = phone_btns.get(0);
System.out.println(phone_btn_0.getAttribute("innerHTML"));

String url = phone_btn_0.findElement(By.tagName("a")).getAttribute("href");

已修改： - 您最初也可以在xpath中修复此问题，以便仅使用相同的代码找到a元素而不是所有*，以及： -

public static String phone_btns_xpath = "//section[@id='postingbody']//a[contains(.,'show contact info')]";

Answer 2

您可以使用.to字符串方法，如下所示..为我工作 String url = phone_btn_0.findElement（By.tagName（“a”））。getAttribute（“href”）。toString（）;

为什么selenium getAttribute（“href”）不起作用？

2 个答案: