我有一个html页面,我想提取标签内部和内部对象_BFD.BFD_INFO的标题。我已经访问了里面的所有数据,但它有很多其他数据,如链接等,现在我不知道如何访问我想要提取的标题。请帮助我。 
我编写的代码是


 导入bs4为bs
 import urllib3.request
 import requests
& #xA; sauce =
 requests.get('https://www.meishij.net/zuofa/huaguluobodunpaigutang.html')
 print(sauce.status_code)
 soup = bs.BeautifulSoup (sauce.content,'html.parser')
 #print(soup.find_all(“script”,type =“text / javascript”)[9])
 print(soup.find(“script “,type =”text / javascript“)[9])



 这就是html

 

< script type =“text / javascript” >&#的xD;
 _czc.push([ '_trackEvent', 'PC', 'pc_news']);&#的xD;
 _czc.push([ '_trackEvent', 'pc','pc_news_class_6']);
 window [“_ BFD”] = window [“_ BFD”] || {};
 _BFD.BFD_INFO = {
“title”:“花菇萝卜炖排骨汤”,
< / script> 代码>
&#的xD;
 答案 0 :(得分:0)
我对正则表达式并不擅长,可以用它来找到'标题'在一条线上。我想下面的代码应该可以工作。
import json
import requests
from bs4 import BeautifulSoup
url = 'https://www.meishij.net/zuofa/huaguluobodunpaigutang.html'
headers = requests.utils.default_headers()
headers.update({
'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:52.0) Gecko/20100101 Firefox/52.0',
})
Link = requests.get(url, headers=headers)
soup =BeautifulSoup(Link.content,"lxml")
scripts = soup.find_all("script")
for script in scripts:
if "_BFD.BFD_INFO" in script.text:
text = script.text
m_text = text.split('=')
m_text = m_text[2].split(":")
m_text = m_text[1].split(',')
encoded = m_text[0].encode('utf-8')
print(encoded.decode('utf-8'))
获取pic的更新:
for script in scripts:
text = script.text
m_text = text.split(',')
for n in m_text:
if 'pic' in n:
print(n)
输出:
C:\Users\siva\Desktop>python test.py
"pic" :"http://s1.st.meishij.net/r/216/197/6174466/a6174466_152117574296827.jpg"
更新2:
for script in scripts:
text = script.text
m_text = text.split('_BFD.BFD_INFO')
for t in m_text:
if "title" in t:
print(t.split(","))
输出:
C:\Users\SSubra02\Desktop>python test.py
[' = {\r\n"title" :"????????"', '\r\n"pic" :"http://s1.st.meishij.net/r/216/197/
6174466/a6174466_152117574296827.jpg"', '\r\n"id" :"1883528"', '\r\n"url" :"http
s://www.meishij.net/zuofa/huaguluobodunpaigutang.html"', '\r\n"category" :[["??"
', '"https://www.meishij.net/chufang/diy/recaipu/"]', '["??"', '"https://www.mei
shij.net/chufang/diy/tangbaocaipu/"]', '["???"', '"https://www.meishij.net/chufa
ng/diy/jiangchangcaipu/"]', '["??"', '"https://www.meishij.net/chufang/diy/wucan
/"]', '["??"', '"https://www.meishij.net/chufang/diy/wancan/"]]', '\r\n"tag" :["
??"', '"??"', '"??"', '"????"', '"????"', '"????"]', '\r\n"author":"????"', '\r\
n"pinglun":"3"', '\r\n"renqi":"4868"', '\r\n"step":"7?"', '\r\n"gongyi":"?"', '\
r\n"nandu":"????"', '\r\n"renshu":"4??"', '\r\n"kouwei":"???"', '\r\n"zbshijian"
:"10??"', '\r\n"prshijian":"<90??"', '\r\n"page_type" :"detail"\r\n};window["_BF
D"] = window["_BFD"] || {};_BFD.client_id = "Cmeishijie";_BFD.script = document.
createElement("script");_BFD.script.type = "text/javascript";_BFD.script.async =
true;_BFD.script.charset = "utf-8";_BFD.script.src =((\'https:\' == document.lo
cation.protocol?\'https://ssl-static1\':\'http://static1\')+\'.baifendian.com/se
rvice/meishijie/meishijie.js\');']
如果您遇到任何问题,请告诉我。