无法从代码中的代码中获取日期。尝试使用回归但不工作,如何使用/不使用回归来获得以下输出。
必需的输出:
Saturday, November 25, 2017
html代码:
<div class="main-content">
<div class="col_7 post-info">strong text
<ul class="no-bullet">
<li><strong>Date:</strong> Saturday, November 25, 2017</li>
<li><strong>Category:</strong> bicycles</li>
<li><strong>Region:</strong> Je (
<new_region>
street
</new_region>
)</li>
<li><strong>Posting ID:</strong> 37021705</li>
<li><button class="btn big primary posting-phone"><span class="icon-phone"></span> <a href="tel:0503748197">0503748197</a></button></li>
</ul>
</div>
</div>
python代码:
soup=BeautifulSoup(pages,'lxml').find('div','main-content')
#soup=BeautifulSoup(pages,'lxml').find('div','col_7 post-info')
ulobj=soup.find('ul','no-bullet')
date=ulobj.findAll(re.compile('\d+\s[a-z]+,\s\d{4}'))
print(date)
输出错误:
[]
[]
[]
答案 0 :(得分:0)
所以这就是我提出的代码:
from bs4 import BeautifulSoup
soup=BeautifulSoup(pages,'html.parser').find('div','main-content')
ulobj = soup.find('ul','no-bullet')
date = ulobj.find("li").text;
print(date)
给出输出:
Date: Saturday, November 25, 2017
并获得所需的输出
>>> print(date[6:])
Saturday, November 25, 2017
但它只能起作用,因为它是html代码中的第一个li