javascript代码在这里:
var goat=6111+7380;
var hen=5548+7476^goat;
var seal=2094+4451^hen;
var rat=1687+7000^seal;
var pig=3997+8240^rat;
我想在python中获取goat
,hen
seal
变量等等。
我的python代码在这里:
animals = 'var goat=6111+7380;var hen=5548+7476^goat;var seal=2094+4451^hen;var rat=1687+7000^seal;var pig=3997+8240^rat;'
[eval(item.replace('var','').strip()) for item in animals.split(';')]####here is wrong
因为eval('goat = 6111 + 7380')是错误的,所以如何让目标等于6111+7380
?
PS:
感谢大家。实际上我抓了一个网站:http://pachong.org/
来获取代理地址和端口。但是端口是由<script>document.write((4513^pig)+15);</script>
生成的。pig
变量是由<script type="text/javascript">var goat=6111+7380;var hen=5548+7476^goat;var seal=2094+4451^hen;var rat=1687+7000^seal;var pig=3997+8240^rat;</script>
生成的,但是每当我抓取索引网站时,这个javascript代码就会改变。所以我不知道如何获得端口值。
###resultstring is something like this '(1646^hen)+19'
def getport(resultstring):
port = eval(resultstring)
return port
proxyurl= 'http://www.pachong.org/'
try:
r = requests.get(proxyurl,timeout=60*4)
except:
print 'I can not get the date of pachong.org'
if r.status_code != 200:
print 'the status is not good. status_code is %s' % r.status_code
return
ht = BeautifulSoup(r.content)
animals = str(ht.head.find_all('script')[-1].text)
[eval(item.replace('var','').strip()) for item in animals.split(';')]###it is wrong here
table = ht.find_all('table', attrs={'class':'tb'})
if not table:
return
table = table[0]
trs = table.find_all('tr',attrs={'data-type':'high'})
tr = trs[0]
idlestring = tr.find_all('td')[5].text
idlestring = idlestring.replace('\n','').replace(' ','')
if idlestring == u'空闲':
# proxy_id += 1
ip = tr.find_all('td')[1].text
portstring = tr.find_all('td')[2].text
patt = re.compile(u'document.write\((.*?)\);')
if re.findall(patt,portstring):
resultstring = re.findall(patt,portstring)[0]
else:
continue
port = getport(resultstring)
ip_port = '%s:%s' % (ip, port)
print 'ip_port is %s' % ip_port
答案 0 :(得分:2)
我不知道你为什么要这样做,但这有效:
for item in animals.split(';'):
exec(item.replace('var','').strip())