python中的正则表达式奇怪:)

时间:2014-11-25 20:19:51

标签: javascript python regex phantomjs

我正在尝试使用phantomjs制作浏览机器人,但在某些情况下,它不够强大,无法满足我的需要,当某些请求失败时,没有选择重试它们。在那些场合中,我回应那些失败或可能失败的请求以及当时浏览器中的cookie。然后我在python脚本中获取信息并从中发出请求。我使用正则表达式从字符串中收集信息,然后继续使用pycurl来发出请求。我附加了处理下面的字符串的python函数。 当我在test.py脚本上单独使用它时,该函数效果很好,但是当我将它添加到主python脚本时它不起作用,即使解释器是同一个机器和文件夹,为什么会这样那会发生什么?

功能:

def getReqs(interface_text):
    if("<van LAST_LOAD>" in interface_text):
        interface_text=str(interface_text[interface_text.rfind("<van LAST_LOAD>"):])
        cookie_req=re.findall(r"<van[^>]*?type='cookies'[^>]*?>([\s\S]*?)</van>[^<]*?<van[^>]*?type='link_taken'[^>]*?href='([^']*?)'>",interface_text)
        topclicks=re.findall(r"<van[^>]*?type='top_request'[^>]*?href='([^']*?)'>",interface_text)
        imgclicks=re.findall(r"<van[^>]*?type='image_request'[^>]*?href='([^']*?)'>",interface_text)
        ind=list()
        for d in cookie_req:
            cooks=re.findall(r"([\S]*?)\t\t([\S]*?)\t\t([\S]*?)\t\t(\d+)",d[0])
            rr=dict()
            rr['cookies']=cooks
            rr['request']=d[1].strip()  
            type_='image'       
            for d in topclicks:
                if(rr['request']==d.strip()): type_='toplink'
            rr['type']=type_
            ind.append(rr)
        return ind
    else:
        return False

STRING:

New URL: http://domain.com/
Request (http://domain.com/css/style.css): 
Request (http://domain.com/tp/filter.php?pro=936): 
Request (http://domain.com/tp/a_ft.php?rand=5): 
<van LAST_LOAD>
Processing images and getting hidden ones
Request (http://domain.com/tp/img.php): 
Images with width set to over 85 67
Done processing images.
Checking Resourse Status
Resourse retrieval status: Started/Full F http://domain.com/
Resourse retrieval status: Started/Full F http://domain.com/css/style.css
Resourse retrieval status: Started/Full F http://domain.com/tp/filter.php?pro=936
Resourse retrieval status: Started/Full F http://domain.com/tp/a_ft.php?rand=5
Resourse retrieval status: Started/Full F http://domain.com/tp/img.php    
Phantom will exit in 33775




    Reclicking




    Clicking Image
    Random Click: 5
    <van type='image_request' href='http://www.domain.com/st/thumbs/238/YOWF8GaqIz.jpg'>
    Dims: 204,514,240,180
    Global mouse position 0 0
    Moving to mouse to 635 295
    mouse moved
    Trying to navigate to: http://domain.com/gallery/www.html?id=437&x=8715eb135db63642cda1ec1c19e8d529&url=aHR0cDovL3d3dy5kcnR1YmVyLmNvbS92aWRlby8xOTE1MDEyL2FtYXRldXItcnVzc2lhbi1zZXgtdGFwZQ==&s=1
    Caused by: LinkClicked
    Will actually navigate: false
    Sent from the page's main frame: false
    Expected links: 5
    <van type='cookies'>
    domain.com      proimg      93ffe5      1417031956
    domain.com      pro_cc3     394ef8df2b      1417031956
    domain.com      pro_cc2     3377058     1417031956
    domain.com      fav     1416945556      1448481556
    domain.com      tp      MXwwfDE0MTY5NDU1NTZ8MTQxNjk0NTU1NnwwO3Rlc3QyMS5jb20=        1417031956
    </van>
    <van type='link_taken' href='http://domain.com/gallery/www.html?id=437&x=8715eb135db63642cda1ec1c19e8d529&url=aHR0cDovL3d3dy5kcnR1YmVyLmNvbS92aWRlby8xOTE1MDEyL2FtYXRldXItcnVzc2lhbi1zZXgtdGFwZQ==&s=1'>




    Reclicking




    Clicking Image
    Random Click: 3
    <van type='image_request' href='http://www.domain.com/st/thumbs/730/PGy0TRimJJ.jpg'>
    Dims: 204,22,240,180
    Global mouse position 635 295
    Moving to mouse to 143 295
    mouse moved
    Trying to navigate to: http://domain.com/gallery/sss.html?id=424&x=e3ad16bcdc583a324acbc3a83f654a7a&url=aHR0cDovL3d3dy5kcnR1YmVyLmNvbS92aWRlby8xOTE2Mjk5L3RvdWNoaW5nLWJlYXV0eXMtanVpY3ktc3BvdA==&s=1
    Caused by: LinkClicked
    Will actually navigate: false
    Sent from the page's main frame: false
    Expected links: 4
    <van type='cookies'>
    domain.com      proimg      93ffe5      1417031956
    domain.com      pro_cc3     394ef8df2b      1417031956
    domain.com      pro_cc2     3377058     1417031956
    domain.com      fav     1416945556      1448481556
    domain.com      tp      MXwwfDE0MTY5NDU1NTZ8MTQxNjk0NTU1NnwwO3Rlc3QyMS5jb20=        1417031956
    </van>
    <van type='link_taken' href='http://domain.com/gallery/sss.html?id=424&x=e3ad16bcdc583a324acbc3a83f654a7a&url=aHR0cDovL3d3dy5kcnR1YmVyLmNvbS92aWRlby8xOTE2Mjk5L3RvdWNoaW5nLWJlYXV0eXMtanVpY3ktc3BvdA==&s=1'>




    Reclicking




    Clicking Image
    Random Click: 7
    <van type='image_request' href='http://www.domain.com/st/thumbs/867/uLzPrb0K45.jpg'>
    Dims: 424,22,240,180
    Global mouse position 143 295
    Moving to mouse to 143 515
    mouse moved
    Trying to navigate to: http://domain.com/gallery/aaa.html?id=466&x=8dcbd277bf725b468c7933cc81692be0&url=aHR0cDovL3d3dy5kcnR1YmVyLmNvbS92aWRlby8xOTExMzQ0L3doaXRlLWFuZC1ibGFjay10ZWVuLWJhYmVzLW1hc3R1cmJhdGluZw==&s=1
    Caused by: LinkClicked
    Will actually navigate: false
    Sent from the page's main frame: false
    Expected links: 3
    <van type='cookies'>
    domain.com      proimg      93ffe5      1417031956
    domain.com      pro_cc3     394ef8df2b      1417031956
    domain.com      pro_cc2     3377058     1417031956
    domain.com      fav     1416945556      1448481556
    domain.com      tp      MXwwfDE0MTY5NDU1NTZ8MTQxNjk0NTU1NnwwO3Rlc3QyMS5jb20=        1417031956
    </van>
    <van type='link_taken' href='http://domain.com/gallery/aaa.html?id=466&x=8dcbd277bf725b468c7933cc81692be0&url=aHR0cDovL3d3dy5kcnR1YmVyLmNvbS92aWRlby8xOTExMzQ0L3doaXRlLWFuZC1ibGFjay10ZWVuLWJhYmVzLW1hc3R1cmJhdGluZw==&s=1'>

另一方面,此代码返回一个空列表。

#!/usr/bin/python
#mysql* MySQL*
__author__ = 'root'
import MySQLdb
import sys
import random
import subprocess 
import re
import time
import pycurl
import cStringIO
import tldextract




def mergeCookies(cookieList,cookieFile):
    data = open(cookieFile,'r').read()
    precooks=re.findall(ur"([\S]*?)\t([\S]*?)\t([\S]*?)\t([\S]*?)\t([\S]*?)\t([\S]*?)\t([\S]+)",data)
    total="""# Netscape HTTP Cookie File
# http://curl.haxx.se/rfc/cookie_spec.html
# This file was generated by libcurl! Edit at your own risk.

"""
    keeper= list()
    for old in precooks:
        refresh=False
        for new in cookieList:
            print str(old[0]).strip()
            new_parse=tldextract.extract(new[0])
            old_parse=tldextract.extract(old[0])
            if (new_parse[1].strip()==old_parse[1].strip() and str(new[1]).strip()==str(old[5]).strip() and not(str(old[0]).strip()+str(old[5]).strip() in keeper or str(new[0]).strip()+str(new[1]).strip() in keeper)):
                total+=str(old[0]).strip()+"\t"+"TRUE"+"\t"+"/\tFALSE\t1579998218\t"+str(new[1]).strip()+"\t"+str(new[2]).strip()+"\n"
                keeper.append(str(old[0]).strip()+str(old[5]).strip())
                keeper.append(str(new[0]).strip()+str(new[1]).strip())
                refresh=True
        if(not refresh):
            total+=str(old[0]).strip()+"\t"+"TRUE"+"\t"+"/\tFALSE\t1579998218\t"+str(old[5]).strip()+"\t"+str(old[6]).strip()+"\n"
    for new in cookieList:          
        if(not(str(new[0]).strip()+str(new[1]).strip() in keeper)):             
            total+=str(new[0]).strip()+"\t"+"TRUE"+"\t"+"/\tFALSE\t1579998218\t"+str(new[1]).strip()+"\t"+str(new[2]).strip()+"\n"
            keeper.append(str(new[0]).strip()+str(new[1]).strip())
    open(cookieFile,'w').write(total)
def hitFormGetProxy(url,cookieFile,cookieList,proxy,lang,agent,referer,type_,theCol): 
    times=0
    mergeCookies(cookieList,cookieFile)
    while True:
        times+=1
        c = pycurl.Curl()
        buff = cStringIO.StringIO()
        c.setopt(c.URL, url)
        c.setopt(c.WRITEFUNCTION, buff.write)
        c.setopt(c.COOKIEFILE, cookieFile)
        c.setopt(c.COOKIEJAR, cookieFile)
        c.setopt(c.AUTOREFERER, True)
        #c.setopt(c.COOKIESESSION, True)
        #c.setopt(c.COOKIE, cookieString)
        c.setopt(c.FAILONERROR, False)
        c.setopt(c.FOLLOWLOCATION, True)
        c.setopt(c.VERBOSE, True)
        c.setopt(c.PROXY, proxy)
        c.setopt(c.CONNECTTIMEOUT, 10)
        c.setopt(c.TIMEOUT, 25)
        c.setopt(c.MAXREDIRS, 10)
        c.setopt(c.ENCODING, 'gzip,deflate,sdch')
        c.setopt(c.SSL_VERIFYHOST, False)
        c.setopt(c.SSL_VERIFYPEER, False)
        c.setopt(c.FRESH_CONNECT, True)
        c.setopt(c.HEADER, False)
        c.setopt(c.HTTPHEADER, ['Accept-Language: '+str(lang)+'','Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8','Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.3'])
        #c.setopt(c.RETURNTRANSFER, True)
        c.setopt(c.USERAGENT, agent)
        c.setopt(c.REFERER, referer)
        #c.setopt(c.HTTPHEADER, ['Accept: text/html', 'Accept-Charset: UTF-8'])
        c.perform()
        if(not (c.getinfo(pycurl.HTTP_CODE) == 200 or c.getinfo(pycurl.HTTP_CODE)==302 or c.getinfo(pycurl.HTTP_CODE)==301) and times>7):
            if (type_ != 'payed'):
                print "setting proxy offline"
                # cur.execute("UPDATE `proxies` SET `status`='inactive',`last_checked`='"+str(int(time.time()))+"' WHERE `proxy`='"+str(proxy)+"'")
                # cur.execute("UPDATE  `proxies` SET  `"+str(theCol)+"` =  '"+str(int(time.time()))+"',`connections`= `connections`-1 WHERE  `proxies`.`proxy` =  '"+str(proxy)+"';")
            quit()
        elif(len(buff.getvalue())>500):
            unallowed=False
            global unallowed_urls
            dmain=tldextract.extract(c.getinfo(pycurl.EFFECTIVE_URL))
            for url in unallowed_urls:
                dmainurl=tldextract.extract(url)
                if(dmain[1].strip()==dmainurl[1].strip()):
                    unallowed=True
            if(not unallowed):
                ret=buff.getvalue()
                buff.close()
                return ret
            else:
                print "visiting unallowed url"
                break;
        elif(times>12):break




def getReqs(interface_text):
    if("<van LAST_LOAD>" in interface_text):
        interface_text=str(interface_text[interface_text.rfind("<van LAST_LOAD>"):])
        cookie_req=re.findall(r"<van[^>]*?type='cookies'[^>]*?>([\s\S]*?)</van>[^<]*?<van[^>]*?type='link_taken'[^>]*?href='([^']*?)'>",interface_text)
        topclicks=re.findall(r"<van[^>]*?type='top_request'[^>]*?href='([^']*?)'>",interface_text)
        imgclicks=re.findall(r"<van[^>]*?type='image_request'[^>]*?href='([^']*?)'>",interface_text)
        ind=list()
        for d in cookie_req:
            cooks=re.findall(r"([\S]*?)\t\t([\S]*?)\t\t([\S]*?)\t\t(\d+)",d[0])
            rr=dict()
            rr['cookies']=cooks
            rr['request']=d[1].strip()  
            type_='image'       
            for d in topclicks:
                if(rr['request']==d.strip()): type_='toplink'
            rr['type']=type_
            ind.append(rr)
        return ind
    else:
        return False
def escapeshellarg(arg):
        """
        :param arg:
        :return: escaped string for ussage as console argument
        """
        return "\\'".join("'" + p + "'" for p in arg.split("'"))
#output = (Popen(["/usr/bin/java", "-jar", os.path.dirname(os.path.realpath(__file__))+"/headFinder.jar", self.escapeshellarg(str(tree))], stdout=PIPE).communicate()[0]).strip('')

def getSite(a):
    file_ = open('bot'+str(a)+'.ini','r').read()
    p = re.compile(ur'REFERER:([^;]*?);')
    m = re.search(p, file_)
    toReturn = m.group(1)
    return str(toReturn).strip()

def proxy_status(str):
    p = re.compile(ur'<van[^>]*?name=\'proxy_status\'[^>]*?value=\'([^\']*?)\'[^>]*?>')
    m = re.search(p, str)
    toReturn = m.group(1)
    return toReturn

def random_tier(a):
    data = open(a,'r').read()
    data = data.split("}")
    probs = data[1].strip().split('|')
    num=random.randint(0,100)
    totes=0
    toReturn = ''
    for x in range(0,len(probs)-1):
        if(num>totes and num<= totes + int(probs[x].strip())): toReturn = data[x+2] 
        totes+=int(probs[x].strip())        
    return toReturn.strip()

def Random_Lang():
    data = open('language.txt','r').read()
    data = data.split("}")
    probs = data[1].strip().split('|')
    num=random.randint(0,100)
    totes=0
    toReturn = ''
    for x in range(0,len(probs)-1):
        if(num>totes and num<= totes + int(probs[x].strip())): toReturn = data[x+2] 
        totes+=int(probs[x].strip())        
    return toReturn.strip()
def Random_Agent():
    num=random.randint(0,100)
    if(num<16) : return random_tier("IE.txt")
    elif(num>16 and num<=48) : return random_tier("firefox.txt")
    elif(num>48 and num<=93) : return random_tier("CHROME.txt")
    elif(num>93 and num<=97) : return random_tier("safari.txt")
    elif(num>97 and num<=100) : return random_tier("opera.txt")
def Get_Trade(cur,colnum,threadnum):
    print "SELECT * FROM trades_"+str(threadnum)+" WHERE position = '"+str(colnum)+"'"
    cur.execute("SELECT * FROM trades_"+str(threadnum)+" WHERE position = '"+str(colnum)+"'")
    try :
        if (cur.rowcount > 0):
            fetch = cur.fetchall()
            return fetch[0][1],fetch[0][2]
        else:
            print "Found No Trade In That Position !"
            time.sleep(8)
            quit()
    except MySQLdb.Error, e:
        try:
            print "MySQL Error [%d]: %s" % (e.args[0], e.args[1])
        except IndexError:
            print "MySQL Error: %s" % str(e)
        time.sleep(8)
        quit()
def GetPayedProxy(cur,theCol):
    print "SELECT * FROM `proxies` WHERE `"+str(theCol)+"`<'"+str(int(time.time()) - 86400)+"' and `status`='active' and `response`='200' and `PAYMENT`='sharedproxies' and `connections`<3"
    cur.execute("SELECT * FROM `proxies` WHERE `"+str(theCol)+"`<'"+str(int(time.time()) - 86400)+"' and `status`='active' and `response`='200' and `PAYMENT`='sharedproxies' and `connections`<3")
    try :
        if (cur.rowcount > 0):
            fetch = cur.fetchall()
            return fetch[0][0],'payed'
        else:
            print "Found No Shared Proxies available at this time !"
            time.sleep(2)
            return False,False
    except MySQLdb.Error, e:
        try:
            print "MySQL Error [%d]: %s" % (e.args[0], e.args[1])
        except IndexError:
            print "MySQL Error: %s" % str(e)
        time.sleep(2)
        return False,False
def GetScannedProxy(cur,theCol):
    print "SELECT * FROM `proxies` WHERE `"+str(theCol)+"`<'"+str(int(time.time()) - 86400)+"' and `status`='active' and `response`='200' and `PAYMENT`='scanner' and `connections`<3"
    cur.execute("SELECT * FROM `proxies` WHERE `"+str(theCol)+"`<'"+str(int(time.time()) - 86400)+"' and `status`='active' and `response`='200' and `PAYMENT`='scanner' and `connections`<3")
    try :
        if (cur.rowcount > 0):
            fetch = cur.fetchall()
            return fetch[0][0],'scanned'
        else:
            print "Found No Scanned Proxies available at this time !"
            time.sleep(2)
            return False,False
    except MySQLdb.Error, e:
        try:
            print "MySQL Error [%d]: %s" % (e.args[0], e.args[1])
        except IndexError:
            print "MySQL Error: %s" % str(e)
        time.sleep(2)
        return False,False
def GetTTProxy(cur,theCol):
    print "SELECT * FROM `proxies` WHERE `"+str(theCol)+"`<'"+str(int(time.time()) - 86400)+"' and `status`='active' and `response`='200' and (`tier`='1' or `tier`='2') and `response_time`<10 and `PAYMENT`!='sharedproxies' and `PAYMENT`!='scanner' and `connections`<3"
    cur.execute("SELECT * FROM `proxies` WHERE `"+str(theCol)+"`<'"+str(int(time.time()) - 86400)+"' and `status`='active' and `response`='200' and (`tier`='1' or `tier`='2') and `response_time`<10 and `PAYMENT`!='sharedproxies' and `PAYMENT`!='scanner' and `connections`<3")
    try :
        if (cur.rowcount > 0):
            fetch = cur.fetchall()
            return fetch[0][0],'tt'
        else:
            print "Found No T1 T2 Proxies available at this time !"
            time.sleep(2)
            return False,False
    except MySQLdb.Error, e:
        try:
            print "MySQL Error [%d]: %s" % (e.args[0], e.args[1])
        except IndexError:
            print "MySQL Error: %s" % str(e)
        time.sleep(2)
        return False,False
def GetT3Proxy(cur,theCol):
    print "SELECT * FROM `proxies` WHERE `"+str(theCol)+"`<'"+str(int(time.time()) - 86400)+"' and `status`='active' and `response`='200' and `tier`='3' and `response_time`<10 and `PAYMENT`!='sharedproxies' and `PAYMENT`!='scanner' and `connections`<3"
    cur.execute("SELECT * FROM `proxies` WHERE `"+str(theCol)+"`<'"+str(int(time.time()) - 86400)+"' and `status`='active' and `response`='200' and `tier`='3' and `response_time`<10 and `PAYMENT`!='sharedproxies' and `PAYMENT`!='scanner' and `connections`<3")
    try :
        if (cur.rowcount > 0):
            fetch = cur.fetchall()
            return fetch[0][0],'t3'
        else:
            print "Found No T3 Proxies available at this time !"
            time.sleep(2)
            return False,False
    except MySQLdb.Error, e:
        try:
            print "MySQL Error [%d]: %s" % (e.args[0], e.args[1])
        except IndexError:
            print "MySQL Error: %s" % str(e)
        time.sleep(2)
        return False,False
def Get_Proxy(cur,theCol):
    print "Trying to get Shared Proxy"
    proxy,type=GetPayedProxy(cur,theCol)
    if(proxy==False or type == False):
        print "Trying to get Scanned Proxy"
        proxy,type=GetScannedProxy(cur,theCol)
        if(proxy==False or type == False):
            print "Trying to get T1 T2 Proxy"
            proxy,type=GetTTProxy(cur,theCol)
            if(proxy==False or type == False):
                print "Trying to get T3 Proxy"
                proxy,type=GetT3Proxy(cur,theCol)
                if(proxy==False or type == False):
                    print "No proxies available at this time!!!"
                else:
                    return proxy,type
            else:
                return proxy,type
        else:
            return proxy,type
    else:
        return proxy,type

def getReqs(interface_text):
    toReturn = dict()

    return toReturn
if __name__=='__main__':
    data="""New URL: http://domain.com/
Request (http://domain.com/css/style.css): 
Request (http://domain.com/tp/filter.php?pro=936): 
Request (http://domain.com/tp/a_ft.php?rand=5): 
<van LAST_LOAD>
Processing images and getting hidden ones
Request (http://domain.com/tp/img.php): 
Images with width set to over 85 67
Done processing images.
Checking Resourse Status
Resourse retrieval status: Started/Full F http://domain.com/
Resourse retrieval status: Started/Full F http://domain.com/css/style.css
Resourse retrieval status: Started/Full F http://domain.com/tp/filter.php?pro=936
Resourse retrieval status: Started/Full F http://domain.com/tp/a_ft.php?rand=5
Resourse retrieval status: Started/Full F http://domain.com/tp/img.php    
Phantom will exit in 33775




    Reclicking




    Clicking Image
    Random Click: 5
    <van type='image_request' href='http://www.domain.com/st/thumbs/238/YOWF8GaqIz.jpg'>
    Dims: 204,514,240,180
    Global mouse position 0 0
    Moving to mouse to 635 295
    mouse moved
    Trying to navigate to: http://domain.com/gallery/www.html?id=437&x=8715eb135db63642cda1ec1c19e8d529&url=aHR0cDovL3d3dy5kcnR1YmVyLmNvbS92aWRlby8xOTE1MDEyL2FtYXRldXItcnVzc2lhbi1zZXgtdGFwZQ==&s=1
    Caused by: LinkClicked
    Will actually navigate: false
    Sent from the page's main frame: false
    Expected links: 5
    <van type='cookies'>
    domain.com      proimg      93ffe5      1417031956
    domain.com      pro_cc3     394ef8df2b      1417031956
    domain.com      pro_cc2     3377058     1417031956
    domain.com      fav     1416945556      1448481556
    domain.com      tp      MXwwfDE0MTY5NDU1NTZ8MTQxNjk0NTU1NnwwO3Rlc3QyMS5jb20=        1417031956
    </van>
    <van type='link_taken' href='http://domain.com/gallery/www.html?id=437&x=8715eb135db63642cda1ec1c19e8d529&url=aHR0cDovL3d3dy5kcnR1YmVyLmNvbS92aWRlby8xOTE1MDEyL2FtYXRldXItcnVzc2lhbi1zZXgtdGFwZQ==&s=1'>




    Reclicking




    Clicking Image
    Random Click: 3
    <van type='image_request' href='http://www.domain.com/st/thumbs/730/PGy0TRimJJ.jpg'>
    Dims: 204,22,240,180
    Global mouse position 635 295
    Moving to mouse to 143 295
    mouse moved
    Trying to navigate to: http://domain.com/gallery/sss.html?id=424&x=e3ad16bcdc583a324acbc3a83f654a7a&url=aHR0cDovL3d3dy5kcnR1YmVyLmNvbS92aWRlby8xOTE2Mjk5L3RvdWNoaW5nLWJlYXV0eXMtanVpY3ktc3BvdA==&s=1
    Caused by: LinkClicked
    Will actually navigate: false
    Sent from the page's main frame: false
    Expected links: 4
    <van type='cookies'>
    domain.com      proimg      93ffe5      1417031956
    domain.com      pro_cc3     394ef8df2b      1417031956
    domain.com      pro_cc2     3377058     1417031956
    domain.com      fav     1416945556      1448481556
    domain.com      tp      MXwwfDE0MTY5NDU1NTZ8MTQxNjk0NTU1NnwwO3Rlc3QyMS5jb20=        1417031956
    </van>
    <van type='link_taken' href='http://domain.com/gallery/sss.html?id=424&x=e3ad16bcdc583a324acbc3a83f654a7a&url=aHR0cDovL3d3dy5kcnR1YmVyLmNvbS92aWRlby8xOTE2Mjk5L3RvdWNoaW5nLWJlYXV0eXMtanVpY3ktc3BvdA==&s=1'>




    Reclicking




    Clicking Image
    Random Click: 7
    <van type='image_request' href='http://www.domain.com/st/thumbs/867/uLzPrb0K45.jpg'>
    Dims: 424,22,240,180
    Global mouse position 143 295
    Moving to mouse to 143 515
    mouse moved
    Trying to navigate to: http://domain.com/gallery/aaa.html?id=466&x=8dcbd277bf725b468c7933cc81692be0&url=aHR0cDovL3d3dy5kcnR1YmVyLmNvbS92aWRlby8xOTExMzQ0L3doaXRlLWFuZC1ibGFjay10ZWVuLWJhYmVzLW1hc3R1cmJhdGluZw==&s=1
    Caused by: LinkClicked
    Will actually navigate: false
    Sent from the page's main frame: false
    Expected links: 3
    <van type='cookies'>
    domain.com      proimg      93ffe5      1417031956
    domain.com      pro_cc3     394ef8df2b      1417031956
    domain.com      pro_cc2     3377058     1417031956
    domain.com      fav     1416945556      1448481556
    domain.com      tp      MXwwfDE0MTY5NDU1NTZ8MTQxNjk0NTU1NnwwO3Rlc3QyMS5jb20=        1417031956
    </van>
    <van type='link_taken' href='http://domain.com/gallery/aaa.html?id=466&x=8dcbd277bf725b468c7933cc81692be0&url=aHR0cDovL3d3dy5kcnR1YmVyLmNvbS92aWRlby8xOTExMzQ0L3doaXRlLWFuZC1ibGFjay10ZWVuLWJhYmVzLW1hc3R1cmJhdGluZw==&s=1'>"""
    print getReqs(data)
    quit()

1 个答案:

答案 0 :(得分:1)

您可以在第103行定义getReqs功能。

然后,在第287行,你用这个定义替换了这个定义:

def getReqs(interface_text):
    toReturn = dict()

    return toReturn

所以,当你在第395行打电话时:

print getReqs(data)

...你正在调用第二个定义,所以你打印出一个空字典并不奇怪。