因此,我正在尝试从具有深层嵌套<script>
标签的网站中获取特定数据。
使用import json
希望使事情变得更轻松,这会导致著名的Expecting value: line 1 column 1 (char 0)
错误。因此,我尝试了以下方法1,但成功率为零。
从本质上讲,连接到站点,捕获特定<script>
标签的相对简单的步骤没有问题。从我需要的数据中获取数据似乎有问题。
假设以下元素:
script_tag = '''
<script id="startup" type="text/javascript">
$(document).ready(function () {createJsonChart({
"series":[{"name":"BNames","color":"#0043de","legendIndex":0,
"stack":null,
"data":[{"name":"BNames","color":"#0043de","y":0.0,
"legendIndex":0,
"events":{"click":function(){return false;}},
"subtotal":0.0,"displayValue":"0","tooltip":""},
{"name":"BNames","color":"#0043de","y":114.6,
"legendIndex":0,
"events":{"click":function(){return false;}},
"subtotal":0.0,"displayValue":"0",
"tooltip":"BNames: 114,60 % <br/> Month: oktober 2018"},
{"name":"BNames","color":"#0043de","y":108.5,
"legendIndex":0,
"events":{"click":function(){return false;}},
"subtotal":0.0,"displayValue":"0",
"tooltip":"BNames: 108,50 % <br/> Month: september 2019"},
{"name":"BNames","color":"#0043de","y":0.0,
"legendIndex":0,
"events":{"click":function(){return false;}},
"subtotal":0.0,"displayValue":"0","tooltip":""}]},
{"type":"line","marker":{"enabled":false,
"linecolor":null,"lineWidth":0,
"fillColor":null,"symbol":null,"radius":4},
"dashStyle":"Solid","lineWidth":2,
"step":"center","zIndex":"2","name":"Mandatory","color":"#f20808",
"legendIndex":0,"stack":1,
"data":[{"name":"Mandatory","color":"#f20808","y":104.1,
"legendIndex":0,
"events":{"click":function(){return false;}},"subtotal":0.0,"displayValue":"0",
"tooltip":"Mandatory: 104,10 %: 104,10 %"},
{"name":"Mandatory","color":"#f20808","y":104.1,
"legendIndex":0,
"events":{"click":function(){return false;}},
"subtotal":0.0,"displayValue":"0",
"tooltip":"Mandatory: 104,10 %"},
{"name":"Mandatory","color":"#f20808","y":104.1,
"legendIndex":0,
"events":{"click":function(){return false;}},
"subtotal":0.0,"displayValue":"0",
"tooltip":"Mandatory: 104,10 %"}]},
{"type":"line","marker":{"enabled":false,
"linecolor":null,"lineWidth":0,"fillColor":null,
"symbol":null,"radius":4},"dashStyle":"Solid","lineWidth":2,
"step":"center", "zIndex":"2","name":"Preferred","color":"#38d615",
"legendIndex":0,"stack":2,
"data":[{"name":"Preferred","color":"#38d615","y":121.0,
"legendIndex":0,
"events":{"click":function(){return false;}},"subtotal":0.0,"displayValue":"0",
"tooltip":"Preferred: 121,00 %: 121,00 %"},
{"name":"Preferred","color":"#38d615","y":121.0,
"legendIndex":0,
"events":{"click":function(){return false;}},"subtotal":0.0,"displayValue":"0",
"tooltip":"Preferred: 121,00 %"},
{"name":"Preferred","color":"#38d615","y":121.0,
"legendIndex":0,
"events":{"click":function(){return false;}},"subtotal":0.0,"displayValue":"0",
"tooltip":"Preferred: 121,00 %"}]}],
"resizeElement":null,"credits":{"enabled":false}});$('#__Page').lumnaInit('');});
</script>
'''
实际上,这个<script>
标签更大。它包含3部分数据,分别在这里命名为BNames
,Mandatory
和Preferred
。我需要来自BNames
的数据,特别是最后一个条目。因此,预期结果将来自部分"tooltip":"BNames: 108,50 % <br/> Month: september 2019"}
,其中一个变量为BNames: 108,50 %
,另一个变量为Month: september 2019
。
url_part=soup.find("script", attrs={'id':'startup'}).text
info=re.findall(r'\s\w*\s\d*', url_part)[-1]
result=re.findall(r'(BNames: (\d+[,]\d+\s[%]))', url_part)[-1][1]
首先定义要使用的HTML标记。其次,找到所有大小为字母(\w*
),后跟空格(\s
)和任何大小数字(\d*
)的所有实例。这将与2019年9月或2019年8月之类的任何内容匹配。最后,查找与BNames:
匹配的实例,该实例中带有以下数字的设置:数字,逗号,数字,空格和百分号。因此(\d+[,]\d+\s[%]
的确匹配从80.6%到120.05%的所有内容
答案 0 :(得分:2)