我正在制作广告拦截计划。这是我的代码:
from selenium import webdriver
url = input('Enter URL to detect ads from: ')
browser = webdriver.Chrome()
browser.get('http://'+url)
all_iframes = browser.find_elements_by_tag_name("iframe")
if len(all_iframes) > 0:
print("" + "Ads Found\n")
browser.execute_script("""
var elems = document.getElementsByTagName("iframe");
for(var i = 0, max = elems.length; i < max; i++)
{
elems[i].hidden=true;
}
""")
print('Total Ads: ' + str(len(all_iframes)))
else:
print('No Ads found')
我的问题是,有没有办法检查iframe的超文本引用,并将它们与this页面上的广告服务器IP进行比较?
答案 0 :(得分:1)
您可以尝试以下解决方案,但我不确定这是否涵盖所有情况(我现在无法检查):
import requests
import sockets
from selenium import webdriver
url = input('Enter URL to detect ads from: ')
browser = webdriver.Chrome()
browser.get('http://'+url)
all_iframes = browser.find_elements_by_tag_name("iframe")
# Get IP list of ad servers with GET HTTP request (you might need to use "pip install requests")
list_of_ad_servers = requests.get('http://pgl.yoyo.org/adservers/iplist.php?ipformat=&showintro=1&mimetype=plaintext').text.split()
if len(all_iframes) > 0:
for i in all_iframes:
try:
source = i.get_attribute('src')
if source.startswith('http'): # to get only 3rd-party links
# Get IP of source link and check if it present in ad servers list
if socket.gethostbyname(source.split('/')[2]) in list_of_ad_servers:
print('This is advertisement iframe!')
browser.execute_script('arguments[0].hidden=true;', i)
except: pass
答案 1 :(得分:1)
抱歉,我不熟悉Python语法,但可以从java视角回答,你可以扩展到测试。
到ipAdd网站,获取pagesource。
driver.get("http://pgl.yoyo.org/as/serverlist.php?hostformat=adblockplus");
String pageSrc=driver.getpagesource(); //Get page source
List<String> ipList=pageSrc.split("\\||*\\^");Split based on start and end character
在您的测试网站上,获取iframe mebelements并与ipAdd list进行比较
List<Webelement> all_iframes = driver.findElements(by.tag_name("iframe"));//Creates list of iframe webelements
for(Webelement iframe:all_iframes){
if(//Compare iframe.getAttribute("name") with ipaddress list){ //check whether ipaddress list contains frame name
SOPL("Found");
}
}