我正在编写一个python脚本来从网站中提取细节。我的代码如下。
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
my_url = 'my_company_website'
#opening up connection, grabbing the page
uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()
#html parsing
page_soup = soup(page_html, "html.parser")
#grabs each product
containers = page_soup.findAll("div",{"class":"navigator-content"})
print (containers)
我得到了像这样的输出
<div
class="navigator-content" data-issue-table-model-state="" data-selected-issue="" data-session-search-state="">
</div>
我希望将其中的内容作为输出。请帮忙。
答案 0 :(得分:1)
你可以试试这个:
import re
inside = re.split(r'>*</*div', containers)[1].split()
inside
就是这个清单:
['class="navigator-content"',
'data-issue-table-model-state=""',
'data-selected-issue=""',
'data-session-search-state=""']