检查链接中每个元素的超文本引用,并与adserver IP进行比较

时间:2017-01-05 18:53:52

标签: python selenium

我正在制作广告拦截计划。这是我的代码:

from selenium import webdriver

url = input('Enter URL to detect ads from: ')

browser = webdriver.Chrome()
browser.get('http://'+url)

all_iframes = browser.find_elements_by_tag_name("iframe")
if len(all_iframes) > 0:
    print("" + "Ads Found\n")
    browser.execute_script("""
    var elems = document.getElementsByTagName("iframe"); 
    for(var i = 0, max = elems.length; i < max; i++)
         {
             elems[i].hidden=true;
         }
                      """)
    print('Total Ads: ' + str(len(all_iframes)))
else:
    print('No Ads found')

我的问题是,有没有办法检查iframe的超文本引用,并将它们与this页面上的广告服务器IP进行比较?

2 个答案:

答案 0 :(得分:1)

您可以尝试以下解决方案,但我不确定这是否涵盖所有情况(我现在无法检查):

import requests 
import sockets
from selenium import webdriver

url = input('Enter URL to detect ads from: ')

browser = webdriver.Chrome()
browser.get('http://'+url)

all_iframes = browser.find_elements_by_tag_name("iframe")

# Get IP list of ad servers with GET HTTP request (you might need to use "pip install requests")
list_of_ad_servers = requests.get('http://pgl.yoyo.org/adservers/iplist.php?ipformat=&showintro=1&mimetype=plaintext').text.split()
if len(all_iframes) > 0:
    for i in all_iframes:
        try:
            source = i.get_attribute('src')
            if source.startswith('http'):  # to get only 3rd-party links
                # Get IP of source link and check if it present in ad servers list
                if socket.gethostbyname(source.split('/')[2]) in list_of_ad_servers:
                    print('This is advertisement iframe!')
                    browser.execute_script('arguments[0].hidden=true;', i)
        except: pass

答案 1 :(得分:1)

抱歉,我不熟悉Python语法,但可以从java视角回答,你可以扩展到测试。

到ipAdd网站,获取pagesource。

driver.get("http://pgl.yoyo.org/as/serverlist.php?hostformat=adblockplus");
String pageSrc=driver.getpagesource(); //Get page source
List<String> ipList=pageSrc.split("\\||*\\^");Split based on start and end character

在您的测试网站上,获取iframe mebelements并与ipAdd list进行比较

  List<Webelement> all_iframes = driver.findElements(by.tag_name("iframe"));//Creates list of iframe webelements
 for(Webelement iframe:all_iframes){
    if(//Compare iframe.getAttribute("name") with ipaddress list){  //check whether ipaddress list contains frame name
      SOPL("Found");
    }
  }