我是python的新手,我试图使用漂亮的汤在具有dataLayer的页面上查找脚本标签,然后检索postNo的值并打印出来。
<head>
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script>
<script src="https://maxcdn.bootstrapcdn.com/bootstrap/3.4.0/js/bootstrap.min.js"></script>
<!-- Data Layer - Begin -->
<script>
dataLayer = [
{
'country': 'UnitedKingdom',
'site': 'Blog',
'postNo': '34',
'pageType': 'Home',
'pageType2': 'Blog',
'pageType3': 'Top Tips'
}
];
</script>
<!-- Data Layer - End -->
</head>
任何帮助或指针将不胜感激。 谢谢
答案 0 :(得分:1)
import requests
import bs4
import json
html = '''
<head>
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script>
<script src="https://maxcdn.bootstrapcdn.com/bootstrap/3.4.0/js/bootstrap.min.js"></script>
<!-- Data Layer - Begin -->
<script>
dataLayer = [
{
'country': 'UnitedKingdom',
'site': 'Blog',
'postNo': '34',
'pageType': 'Home',
'pageType2': 'Blog',
'pageType3': 'Top Tips'
}
];
</script>
<!-- Data Layer - End -->
</head>'''
soup = bs4.BeautifulSoup(html, 'html.parser')
scripts = soup.find_all('script')
for script in scripts:
if 'dataLayer = ' in script.text:
jsonStr = script.text.strip()
jsonStr = jsonStr.split('[')[1].strip()
jsonStr = jsonStr.split(']')[0].strip()
jsonStr = jsonStr.replace("'", '"')
jsonObj = json.loads(jsonStr)
print (jsonObj['postNo'])
输出:
print (jsonObj['postNo'])
34
答案 1 :(得分:0)
只需从html中提取列表并解析,就很简单。请参见下面的代码。
from bs4 import BeautifulSoup
import ast
html = '''
<head>
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script>
<script src="https://maxcdn.bootstrapcdn.com/bootstrap/3.4.0/js/bootstrap.min.js"></script>
<!-- Data Layer - Begin -->
<script>
dataLayer = [
{
'country': 'UnitedKingdom',
'site': 'Blog',
'postNo': '34',
'pageType': 'Home',
'pageType2': 'Blog',
'pageType3': 'Top Tips'
}
];
</script>
<!-- Data Layer - End -->
</head>'''
soup = BeautifulSoup(html, 'html.parser')
content = soup.findAll('script')[2].text.replace(';','').replace('dataLayer = ','').strip()
data = ast.literal_eval(content)
print([x['postNo'] for x in data])