使用selenium webdriver提取javascript生成的链接

时间:2016-06-28 17:07:29

标签: javascript selenium-webdriver ads

我正在尝试从网页中提取广告。我正在使用iplocation的页面作为我的起始页面。 Iplocation通过双击(adsense)提供广告。目前我只专注于iframe以外的广告,格式为

<a href="...", ...> <img src="...", ...> </a>

第一步是获取所有锚标签,然后我将检查这些锚标签的href属性。但是我没有在我的selenium代码中获得这些javascript生成的锚标签。我已经尝试了隐式等待和显式等待,但没有成功。我还研究了堆栈溢出给出的一些解决方案,等待页面加载,但似乎没有任何工作。我想在锚标签列表中获得doubleclick的链接。这是代码:

 WebDriver driver = new FirefoxDriver();
 driver.manage().timeouts().pageLoadTimeout(200, TimeUnit.SECONDS);
 String baseUrl = "http://iplocation.net/";
 driver.get(baseUrl);

 List<WebElement> allAnchorTags= driver.findElements(By.tagName("a"));
 System.out.println("Number of anchor tags = "+allAnchorTags.size());

 for(Element a : doc.getElementsByTag("a")){
     String url = a.attr("abs:href");
     System.out.println(url);
 }

Edit1:javascript生成的锚标记示例。其中一个广告的数据来自iplocation.net

<a data-original-click-url="https://www.googleadservices.com/pagead/aclk?sa=L&amp;ai=C6VDnU-5yV-_8EZO_Wqq9rdAPvYOC_kTVrt2I_wLFkpOWiwUQASDjheAdYMvctAWgAYvv1b4DyAECqAMByAPBBKoEe0_Qlt9if5TWbxdiiMYNqHQfoxw4kBBibtZIqaosjVqUCjcPPca0lHl4k_GaLNt1_cXekFO1tH7ythmHWyV2u34pE9VHAIHEEl-IuzRea2pSXLE_2vVe_bg982jp-jkFG8K0q-WHIETCWTTUt0LbOS4A-GYzCuqdyf5S7ogGAaAGAoAH3ZCqQagHpr4b2AcB&amp;num=1&amp;cid=CAASEuRofHi4jyWW3saxgIxOgVIsBQ&amp;sig=AOD64_14Fv-B2-gzVrNHqJNoh_Q4tfkL9A&amp;client=ca-pub-1026064395378929&amp;adurl=https://semanticscholar.org" id="aw0" target="_top" href="https://www.googleadservices.com/pagead/aclk?sa=L&amp;ai=C6VDnU-5yV-_8EZO_Wqq9rdAPvYOC_kTVrt2I_wLFkpOWiwUQASDjheAdYMvctAWgAYvv1b4DyAECqAMByAPBBKoEe0_Qlt9if5TWbxdiiMYNqHQfoxw4kBBibtZIqaosjVqUCjcPPca0lHl4k_GaLNt1_cXekFO1tH7ythmHWyV2u34pE9VHAIHEEl-IuzRea2pSXLE_2vVe_bg982jp-jkFG8K0q-WHIETCWTTUt0LbOS4A-GYzCuqdyf5S7ogGAaAGAoAH3ZCqQagHpr4b2AcB&amp;num=1&amp;cid=CAASEuRofHi4jyWW3saxgIxOgVIsBQ&amp;sig=AOD64_14Fv-B2-gzVrNHqJNoh_Q4tfkL9A&amp;client=ca-pub-1026064395378929&amp;nm=1&amp;mb=2&amp;bg=!RkWlRV1EI6hvpP6naIMCAAAAR1IAAAA6mQE0TMakESu8sZGPa3bBy_OqpXrYwIAq6s1kayeXAKDeUXdm9RPIzlFXaGuBM1rGtEmhztZlKhdfiJQpbiXILcVlbzrbvG-DijStEsCGTTlUX-Nb9c_qCHE5b9SQx2a_6-AxumGJUHj3Mlf2nv8dG-Z93YyX4yblF6L5XInRxaqwsTpVhNebQdUmoer7uDSSik4fI48VOgk1_PungBhrjkLSbWJKGV6gFH7PqyrAC5pZ-WDYS7g_y7EmFz_DIBtKotP0gjz2CucJWn93h01HQQDBKSNNgj-pERxb1jWIP14t9lBrlRjuez8n0xtlbQyvDRHt7VKtQ8d4pfBNAb2b80mJu6D2KuwpZndHqaZ8GO-a48qXXily5FELFK-uhNc6krqgzxcUgeGAjZAC_-EUVIm8WN1pQcc&amp;adurl=https://semanticscholar.org"><img src="https://tpc.googlesyndication.com/simgad/8383265107738210087" alt="" class="img_ad" onload="" width="336" border="0"></a>

0 个答案:

没有答案