使用re.findall

时间:2016-12-30 17:44:00

标签: javascript python html regex python-2.7

所以我有几个问题需要解决。

首先,我试图解析我从html获得的这个javascript。

  

$(document).ready(function(){         $('#commercial-show-thumbnails')。bxSlider({mode:'vertical',auto:   false,controls:true,pager:false,minSlides:4,maxSlides:4,   moveSlides:1,slideWidth:250});         itemSelector('commodity-show-form','commodity-show-addcart-submit',   [['color','Choose color'],['size','Choose size']],{“39805”:{   “params”:[“Smokey Blue / Mica Blue”,“36”]},“39806”:{“params”:   [“Smokey Blue / Mica Blue”,“36,5”]},“39807”:{“params”:[“Smokey   Blue / Mica Blue“,”37,5“]},”39808“:{”params“:[”Smokey Blue / Mica   蓝色“,”38“]},”39809“:{”params“:[”Smokey Blue / Mica Blue“,   “38,5”]},“39810”:{“params”:[“Smokey Blue / Mica Blue”,“39”]},   “39811”:{“params”:[“Smokey Blue / Mica Blue”,“40”]},“39812”:{   “params”:[“Smokey Blue / Mica Blue”,“40,5”]},“39814”:{“params”:   [“Smokey Blue / Mica Blue”,“42”]}},   [39805,39806,39807,39808,39809,39810,39811,39812,39814],'主推车',   “商品显示图像”); });

res = re.findall(r'{ "params": (.+?)}', text)  # text is where javascript text is stored

final = [eval(i) for i in res]

print(final)

我得到了以下输出

[['Smokey Blue / Mica Blue','36'],['Smokey Blue / Mica Blue','36,5'],['Smokey Blue / Mica Blue','37,5'], ['Smokey Blue / Mica Blue','38'],['Smokey Blue / Mica Blue','38,5'],['Smokey Blue / Mica Blue','39'],[Smokey Blue / Mica Blue','40'],['Smokey Blue / Mica Blue','40,5'],['Smokey Blue / Mica Blue','42']]

但是现在我不知道该如何从这里开始。我想从这里找到值39805

{“39805”:{“params”:[“Smokey Blue / Mica Blue”,“36”]}。我如何解析它以便说如果我正在寻找与36相关的值,它会给我39805?

我很抱歉,但我解析时非常糟糕,我对此很陌生。

2 个答案:

答案 0 :(得分:1)

你可以像这样得到36

import re
import ast

a="""$(document).ready(function() { $('#commodity-show-thumbnails').bxSlider({ mode: 'vertical', auto: false, controls: true, pager: false, minSlides: 4, maxSlides: 4, moveSlides: 1, slideWidth: 250 }); itemSelector('commodity-show-form', 'commodity-show-addcart-submit', [['color', 'Choose color'], ['size', 'Choose size']], { "39805": { "params": ["Smokey Blue/Mica Blue", "36"]}, "39806": { "params": ["Smokey Blue/Mica Blue", "36,5"]}, "39807": { "params": ["Smokey Blue/Mica Blue", "37,5"]}, "39808": { "params": ["Smokey Blue/Mica Blue", "38"]}, "39809": { "params": ["Smokey Blue/Mica Blue", "38,5"]}, "39810": { "params": ["Smokey Blue/Mica Blue", "39"]}, "39811": { "params": ["Smokey Blue/Mica Blue", "40"]}, "39812": { "params": ["Smokey Blue/Mica Blue", "40,5"]}, "39814": { "params": ["Smokey Blue/Mica Blue", "42"]} }, [39805,39806,39807,39808,39809,39810,39811,39812,39814], 'main-cart', 'commodity-show-image'); });"""
b = re.findall(r'.*?({ ".*?} }).*}', a)[0]

d1 = ast.literal_eval(b)
print d1, '\n'

for a,b in d1.iteritems():
    if b['params'][1]=='36':
        print a

输出:

{'39809': {'params': ['Smokey Blue/Mica Blue', '38,5']}, '39808': {'params': ['Smokey Blue/Mica Blue', '38']}, '39805': {'params': ['Smokey Blue/Mica Blue', '36']}, '39807': {'params': ['Smokey Blue/Mica Blue', '37,5']}, '39806': {'params': ['Smokey Blue/Mica Blue', '36,5']}, '39812': {'params': ['Smokey Blue/Mica Blue', '40,5']}, '39814': {'params': ['Smokey Blue/Mica Blue', '42']}, '39810': {'params': ['Smokey Blue/Mica Blue', '39']}, '39811': {'params': ['Smokey Blue/Mica Blue', '40']}} 

39805

答案 1 :(得分:1)

编辑:我刚才意识到在某些情况下,大小有两个数字,如“36,5”。我认为这意味着36和半。无论如何,我的原始脚本没有考虑到这一点,这就是为什么它给出了错误的答案(我不小心注意到了。)这是一个似乎有效的修订脚本:

import re
text='''$(document).ready(function() { $('#commodity-show-thumbnails').bxSlider({ mode: 'vertical', auto: false, controls: true, pager: false, minSlides: 4, maxSlides: 4, moveSlides: 1, slideWidth: 250 }); itemSelector('commodity-show-form', 'commodity-show-addcart-submit', [['color', 'Choose color'], ['size', 'Choose size']], { "39805": { "params": ["Smokey Blue/Mica Blue", "36"]}, "39806": { "params": ["Smokey Blue/Mica Blue", "36,5"]}, "39807": { "params": ["Smokey Blue/Mica Blue", "37,5"]}, "39808": { "params": ["Smokey Blue/Mica Blue", "38"]}, "39809": { "params": ["Smokey Blue/Mica Blue", "38,5"]}, "39810": { "params": ["Smokey Blue/Mica Blue", "39"]}, "39811": { "params": ["Smokey Blue/Mica Blue", "40"]}, "39812": { "params": ["Smokey Blue/Mica Blue", "40,5"]}, "39814": { "params": ["Smokey Blue/Mica Blue", "42"]} }, [39805,39806,39807,39808,39809,39810,39811,39812,39814], 'main-cart', 'commodity-show-image'); });'''
pattern = re.compile(r' "([0-9]+).*?params.*?([0-9]+(,5)?)')

s={b:a for a,b,_ in pattern.findall(text)}

print(s['36'], s['36,5'])

现在打印39805 39806,看起来对我来说。

以下是所有数据:

for a in sorted(s):print(a, s[a])
36 39805
36,5 39806
37,5 39807
38 39808
38,5 39809
39 39810
40 39811
40,5 39812
42 39814