我正在使用Beautifulsoup在HTML中搜索特定的数字,但却被困在这里。
The raw data is:
<div class='box_content' hn_bookmark='true' ng_init=" bookmarked=false; bookmark_id=''; bookmarks_path='/en-US/bookmarks'; bookmarkable_id='000000'; bookmarkable_type=''; ">
我想提取“bookmarkable_id”。
bsobj = BeautifulSoup(text,"html.parser")
questionID_line = bsobj.find("div",{"class":"box_content"})['ng_init']
它返回一个字符串,其中的单词用分号分隔:
bookmarked=false; bookmark_id=''; bookmarks_path='/en-US/bookmarks'; bookmarkable_id='793447'; bookmarkable_type='Question'
但我不知道如何从这里提取。请帮忙!
答案 0 :(得分:4)
试试这个:
data = "bookmarked=false; bookmark_id=''; bookmarks_path='/en-US/bookmarks'; bookmarkable_id='793447'; bookmarkable_type='Question'"
fields = {}
for f in data.split('; '):
k , v = f.split('=')
fields[k] = v.strip("'")
print(fields)
给出:
{'bookmarked': 'false', 'bookmark_id': '', 'bookmarks_path': '/en-US/bookmarks', 'bookmarkable_type': 'Question', 'bookmarkable_id': '793447'}
答案 1 :(得分:1)
您可以使用re来搜索questionID_line,
import re
re.findall("bookmarkable_id='(.*?)'", questionID_line)
答案 2 :(得分:1)
使用split()
:
data="bookmarked=false; bookmark_id=''; bookmarks_path='/en-US/bookmarks'; bookmarkable_id='793447'; bookmarkable_type='Question'"
output = {i.split("=")[0].strip():i.split("=")[1].strip() for i in data.split(";")}
输出
{'bookmarks_path': "'/en-US/bookmarks'", 'bookmark_id': "''", 'bookmarked': 'false', 'bookmarkable_id': "'793447'", 'bookmarkable_type': "'Question'"}
根据您所需的输出,随意修改strip()
。
答案 3 :(得分:0)
试试这个
s = """
<div class='box_content' hn_bookmark='true' ng_init=" bookmarked=false; bookmark_id=''; bookmarks_path='/en-US/bookmarks'; bookmarkable_id='000000'; bookmarkable_type=''; ">
"""
import re
data = re.findall("bookmark[^=]*='[^']*",s)
dict1 = {}
for j in (data):
one,two = j.split("=")
dict1[one] = two.strip("'")
print dict1
答案 4 :(得分:0)
如果您没有删除最后"; "
尝试解压缩会导致错误,因为拆分会在最后留下一个奇数空字符串:
from bs4 import BeautifulSoup
html = """<div class='box_content' hn_bookmark='true' ng_init=" bookmarked=false; bookmark_id=''; bookmarks_path='/en-US/bookmarks'; bookmarkable_id='000000'; bookmarkable_type=''; ">"""
soup = BeautifulSoup(html)
s = soup.select_one("div.box_content")['ng_init']
d = dict(sub.split("=", 1) for sub in s.strip("; ").split("; "))