如何从分号分隔的字符串生成一个字典?

时间:2016-07-29 05:31:16

标签: python beautifulsoup

我正在使用Beautifulsoup在HTML中搜索特定的数字,但却被困在这里。

The raw data is:

<div class='box_content' hn_bookmark='true' ng_init=" bookmarked=false; bookmark_id=''; bookmarks_path='/en-US/bookmarks'; bookmarkable_id='000000'; bookmarkable_type=''; ">

我想提取“bookmarkable_id”。

bsobj = BeautifulSoup(text,"html.parser")
questionID_line = bsobj.find("div",{"class":"box_content"})['ng_init']

它返回一个字符串,其中的单词用分号分隔:

bookmarked=false; bookmark_id=''; bookmarks_path='/en-US/bookmarks'; bookmarkable_id='793447'; bookmarkable_type='Question'

但我不知道如何从这里提取。请帮忙!

5 个答案:

答案 0 :(得分:4)

试试这个:

data = "bookmarked=false; bookmark_id=''; bookmarks_path='/en-US/bookmarks'; bookmarkable_id='793447'; bookmarkable_type='Question'"

fields = {}
for f in data.split('; '):
    k , v = f.split('=')
    fields[k] = v.strip("'")

print(fields)

给出:

{'bookmarked': 'false', 'bookmark_id': '', 'bookmarks_path': '/en-US/bookmarks', 'bookmarkable_type': 'Question', 'bookmarkable_id': '793447'}

答案 1 :(得分:1)

您可以使用re来搜索questionID_line,

import re
re.findall("bookmarkable_id='(.*?)'", questionID_line)

答案 2 :(得分:1)

使用split()

data="bookmarked=false; bookmark_id=''; bookmarks_path='/en-US/bookmarks'; bookmarkable_id='793447'; bookmarkable_type='Question'"
output = {i.split("=")[0].strip():i.split("=")[1].strip() for i in data.split(";")}

输出

{'bookmarks_path': "'/en-US/bookmarks'", 'bookmark_id': "''", 'bookmarked': 'false', 'bookmarkable_id': "'793447'", 'bookmarkable_type': "'Question'"}

根据您所需的输出,随意修改strip()

答案 3 :(得分:0)

试试这个

s = """
<div class='box_content' hn_bookmark='true' ng_init=" bookmarked=false; bookmark_id=''; bookmarks_path='/en-US/bookmarks'; bookmarkable_id='000000'; bookmarkable_type=''; ">
"""

import re
data = re.findall("bookmark[^=]*='[^']*",s)

dict1 = {}
for j in (data):
    one,two = j.split("=")
    dict1[one] = two.strip("'")

print dict1

答案 4 :(得分:0)

如果您没有删除最后"; "尝试解压缩会导致错误,因为拆分会在最后留下一个奇数空字符串:

from bs4 import BeautifulSoup
html = """<div class='box_content' hn_bookmark='true' ng_init=" bookmarked=false; bookmark_id=''; bookmarks_path='/en-US/bookmarks'; bookmarkable_id='000000'; bookmarkable_type=''; ">"""
soup = BeautifulSoup(html)

s = soup.select_one("div.box_content")['ng_init']

d = dict(sub.split("=", 1) for sub in s.strip("; ").split("; "))