使用python

时间:2018-08-15 13:49:40

标签: python-2.7 selenium selenium-webdriver webdriver selenium-chromedriver

使用window.performance.getEntries()检索网络数据时看到不同的结果

这是代码:

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support import expected_conditions as EC

options = Options()
options.add_argument('--disable-gpu')
driver = webdriver.Chrome(chrome_options=options)

urls = ['https://stackoverflow.com/','https://www.google.com/']

for url in urls:
    driver.get(url)
    image_name = url.split(".")[1] + ".png"
    driver.save_screenshot(image_name)
    performance_data = driver.execute_script('return window.performance.getEntries();')
    for single_data in performance_data:
        file = open('Hero.txt', 'a')
        files = open('Heroes.txt', 'a')
        files.write(str(single_data["name"]))
        if "stack" in single_data["name"]:
            file.write(url + "stack_code 1")
            break

        if "stack" not in single_data["name"]:
            file.write(url + "stack_code 0")
            break

如果删除最后一个if语句,则会在Heroes.txt中获得所有网络呼叫名称。因此,该代码可正确填充第一个if。如果我添加第二个if

if "stack" not in single_data["name"]:
    file.write(url + "stack_code 0")
    break

我在Heroes.txt中得到了这个

https://stackoverflow.com/https://www.google.com/https://www.google.com/images/branding/googlelogo/2x/googlelogo_color_120x44dp.pnghttps://www.google.com/images/branding/googlelogo/2x/googlelogo_color_272x92dp.pnghttps://consent.google.com/status?continue=https://www.google.com&pc=s&timestamp=1534340797&gl=GBhttps://ssl.gstatic.com/gb/images/i2_2ec824b0.pngfirst-paintfirst-contentful-painthttps://www.google.com/gen_204?s=webhp&t=aft&atyp=csi&ei=vS50W_aGHKSalwSCgrLQCQ&rt=wsrt.107,aft.119,prt.119https://www.google.com/images/nav_logo242.pnghttps://www.google.com/xjs/_/js/k=xjs.s.en_GB.LsN8oH7x4FY.O/m=sx,sb,cdos,elog,hsm,jsa,r,d,csi/am=YBZhP_4BJP-_YEBRsBWMsMAMCoZN/rt=j/d=1/dg=0/rs=ACT90oHhp5AGyfczYrjBNR_VenselWZSnAhttps://www.google.com/images/branding/product/ico/googleg_lodp.icohttps://www.google.com/xjs/_/js/k=xjs.s.en_GB.LsN8oH7x4FY.O/am=YBZhP_4BJP-_YEBRsBWMsMAMCoZN/rt=j/d=1/exm=sx,sb,cdos,elog,hsm,jsa,r,d,csi/ed=1/dg=0/rs=ACT90oHhp5AGyfczYrjBNR_VenselWZSnA/m=aa,abd,async,bgd,dvl,foot,ipv6,lu,m,mu,sf,sonic,spch,cbin,tnqaT,cbhb,xz7cCd,fEVMic,WgDvvc?xjs=s1https://www.google.com/gen_204?atyp=csi&ei=vS50W_aGHKSalwSCgrLQCQ&s=webhp&t=all&imc=3&imn=3&imp=0&adh=&conn=onchange&ima=1&ime=0&imeb=0&imeo=0&mem=ujhs.10,tjhs.14,jhsl.2330,dm.8&net=dl.10000,ect.4g,rtt.0&sys=hc.4&rt=aft.118,dcl.121,iml.118,ol.137,prt.118,xjs.297,xjsee.297,xjses.222,xjsls.138,wsrt.107,cst.15,dnst.0,rqst.81,rspt.9,sslt.13,rqstt.17,unt.1,cstt.2,dit.228&zx=1534340797851https://www.google.com/textinputassistant/tia.pnghttps://www.google.com/async/bgasy?ei=vS50W_aGHKSalwSCgrLQCQ&yv=3&async=_fmt:jspbhttps://www.google.com/xjs/_/js/k=xjs.s.en_GB.LsN8oH7x4FY.O/am=YBZhP_4BJP-_YEBRsBWMsMAMCoZN/rt=j/d=1/exm=sx,sb,cdos,elog,hsm,jsa,r,d,csi,aa,abd,async,bgd,dvl,foot,ipv6,lu,m,mu,sf,sonic,spch,cbin,tnqaT,cbhb,xz7cCd,fEVMic,WgDvvc/ed=1/dg=0/rs=ACT90oHhp5AGyfczYrjBNR_VenselWZSnA/m=RMhBfe?xjs=s2https://www.gstatic.com/og/_/js/k=og.og2.en_US.gQBLNoMk7Q0.O/rt=j/m=def/exm=in,fot/d=1/ed=1/rs=AA2YrTuPdnXARx6L0IfRJ8krP-HTrx9fswhttps://www.google.com/gen_204?atyp=i&ei=vS50W_aGHKSalwSCgrLQCQ&vet=10ahUKEwi22cnxmO_cAhUkzYUKHQKBDJoQsmQIDQ..s&zx=1534340797919https://adservice.google.com/adsid/google/uihttps://www.google.com/gen_204?atyp=i&ct=&cad=udla=3&ei=vS50W_aGHKSalwSCgrLQCQ&e=12&zx=1534340797933https://www.google.co.uk/domainless/read?igu=1https://www.google.com/js/bg/5KdFGiZjrMqKMsWhJOuJJel3qQCRBLUAy7GSORuI-sg.jshttps://apis.google.com/_/scs/abc-static/_/js/k=gapi.gapi.en.yK0z3MKtgaU.O/m=gapi_iframes,googleapis_client,plusone/rt=j/sv=1/d=1/ed=1/rs=AHpOoo-SafOYj4n3budMysbWxppU-lxJeg/cb=gapi.loaded_0https://www.google.com/domainless/write?igu=1&data=&xsrf=ALAmJdGvY5TXvkklyYKZuaWBzGhopICz3A:1534340797490https://www.google.com/gen_204?atyp=i&ct=&cad=udla=3&ei=vS50W_aGHKSalwSCgrLQCQ&pd=105&e=2&zx=1534340798039https://www.google.com/gen_204?atyp=i&ct=&cad=udla=1&ei=vS50W_aGHKSalwSCgrLQCQ&act=p&ps=2&zx=1534340798039

我添加第二个if后,就会在Heroes.txt中得到它:

https://stackoverflow.com/https://www.google.com/

有什么想法吗?我在做傻事吗?

1 个答案:

答案 0 :(得分:0)

发生了什么事

当您插入第二个if条件时,代码编写不同的原因是,有时第二个条件会被满足,因此breakfor循环中迭代{ {1}}。如果您删除了第二个performance_data条件,那么当if时将没有'stack' not in single_data["name"],您将继续进行break循环的下一个迭代并继续写入到for

了解事物:

当您添加第二条Heroes.txt语句时,您只有2个值写入if,因为从Heroes.txt循环中保证了break,但是您正在遍历2个URL。

当您删除第二条for语句时,并不总是保证从if循环中获得break,因此(通常)为您提供了更多写入{{1 }}。

建议:

这是逻辑问题,而不是Selenium或Python问题。如果您可以告诉我们您的预期输出是什么,或者您想要写入什么文件,我们可以帮助您构建代码以实现该预期。


根据我的猜测,您可能想要这样的东西(我删除了一些本文中未使用的代码):

for

Heroes.txt的结果:

from selenium import webdriver

driver = webdriver.Chrome()

urls = ['https://stackoverflow.com/','https://www.google.com/']

for url in urls:
    driver.get(url)
    performance_data = driver.execute_script('return window.performance.getEntries();')

    pass_flag = False

    for single_data in performance_data:
        # As opposed to breaking or continuing, we're just going to pass over
        # to the next bit of code where we write to Heroes.txt after we've
        # written to Hero.txt once per URL
        if pass_flag:
            pass 
        else:
            if 'stack' in single_data['name']:
                file = open('Hero.txt', 'a')
                file.write(url + 'stack_code 1')
            if 'stack' not in single_data['name']:
                file = open('Hero.txt', 'a')
                file.write(url + 'stack_code 0')
            pass_flag = True 

        # Unlike the above, we're *always* going to write to Heroes.txt
        files = open('Heroes.txt', 'a')
        files.write(str(single_data['name']))

Hero.txt的结果:

https://stackoverflow.com/stack_code 1https://www.google.com/stack_code 0

我遵循的是与您在上述解决方案中的程序中使用的结构相同的结构。以下是我的处理方式:

Heroes.txt