这是html代码
<html>
<head></head>
<body>
<pre style="word-wrap: break-word; white-space: pre-wrap;">
"{"Title":"One Indian Girl","Year":"2016","Author":"Chetan Bhagat"}"
</pre>
</body>
</html>
我需要废弃我需要的东西。就像我只需要作者姓名一样。
答案 0 :(得分:0)
剥离标签并将json字符串转换为python dict:
import json
soup = BeautifulSoup(html)
text = soup.get_text().strip().strip('"')
d = json.loads(text)
print(d['Author'])
答案 1 :(得分:0)
@vijay ,print json.loads(soup.find("pre").string[2:-2])["Author"];
将完成这项工作。请查看以下在Python交互式终端上执行的代码。
>>> import json
>>> import requests
>>> from bs4 import BeautifulSoup
>>>
>>> html_text = """<html>
... <head></head>
... <body>
... <pre style="word-wrap: break-word; white-space: pre-wrap;">
... "{"Title":"One Indian Girl","Year":"2016","Author":"Chetan Bhagat"}"
... </pre>
... </body>
... </html>"""
>>>
>>> soup = BeautifulSoup(html_text, "html.parser")
>>> print(soup.prettify())
<html>
<head>
</head>
<body>
<pre style="word-wrap: break-word; white-space: pre-wrap;">
"{"Title":"One Indian Girl","Year":"2016","Author":"Chetan Bhagat"}"
</pre>
</body>
</html>
>>>
>>> print(soup.find("pre"))
<pre style="word-wrap: break-word; white-space: pre-wrap;">
"{"Title":"One Indian Girl","Year":"2016","Author":"Chetan Bhagat"}"
</pre>
>>>
>>> print(soup.find("pre").string)
"{"Title":"One Indian Girl","Year":"2016","Author":"Chetan Bhagat"}"
>>> print(soup.find("pre").string[2:-2])
{"Title":"One Indian Girl","Year":"2016","Author":"Chetan Bhagat"}
>>>
>>> d = json.loads(soup.find("pre").string[2:-2])
>>> type(d)
<type 'dict'>
>>>
>>> d
{u'Author': u'Chetan Bhagat', u'Year': u'2016', u'Title': u'One Indian Girl'}
>>>
>>> d["Author"]
u'Chetan Bhagat'
>>>
>>> d["Year"]
u'2016'
>>>
>>> d["Title"]
u'One Indian Girl'
>>>
>>> # Place all in the list
...
>>> l = [d["Title"], d["Year"], d["Author"]]
>>> l
[u'One Indian Girl', u'2016', u'Chetan Bhagat']
>>>
»在列表中获取数据而不引用上面的字典键。
>>> final_data = [str(a.strip().split(":")[1]) for a in soup.find("pre").string[2:-3].replace('\"', '').split(",")]
>>>
>>> final_data
['One Indian Girl', '2016', 'Chetan Bhagat']
>>>
让我们解析上面的直接程序,逐步获取数据(更新)。
>>> data = soup.find("pre").string[2:-3]
>>> data
u'{"Title":"One Indian Girl","Year":"2016","Author":"Chetan Bhagat"'
>>>
>>> data = data.replace('\"', '')
>>> data
u'{Title:One Indian Girl,Year:2016,Author:Chetan Bhagat'
>>>
>>> arr = data.split(",")
>>> arr
[u'{Title:One Indian Girl', u'Year:2016', u'Author:Chetan Bhagat']
>>>
>>> final_data = [str(a.strip().split(":")[1]) for a in arr]
>>> final_data
['One Indian Girl', '2016', 'Chetan Bhagat']
>>>
答案 2 :(得分:0)
这就是我想要的。
exampleSoup = soup(page_html, 'html.parser')
text = exampleSoup.get_text().strip().strip('"')
elems=json.loads(text)
Details=list(elems.values())
for i in Details:
print(i)
elems 为我们提供字典。
我已将字典的键值对中的值设为详细信息。
for循环用于分别获取每个元素。