我想使用bs4从这个html中提取文本,我是新手,似乎无法得到它,任何帮助都非常感激。
<div class="results">
<span class="toggle" ng-click="display.toggleConfig()">{{display.configText}}</span>
<p ng-hide="insecure">It would take <span ng-show="config.calculationsOriginal">a desktop PC</span> about <span class="main">{{time}}</span> to crack your password</p>
<a class="tweet-me" ng-hide="insecure" href="http://twitter.com/home/?status=It would take a desktop PC about {{time}} to crack my password!%0d%0dhttp://hsim.pw">[Tweet Result]</a>
<p ng-show="insecure">Your password would be cracked almost <span class="main">Instantly</span></p>
<a class="tweet-me" ng-show="insecure" href="http://twitter.com/home/?status=My password would be cracked almost instantly!%0d%0dhttp://hsim.pw">[Tweet Result]</a>
<span class="toggle" ng-click="display.toggleDetails()">{{display.detailsText}}</span>
</div>
<ul ng-show="display.details">
<li><strong>Length:</strong> {{length}} characters</li>
<li><strong>Character Combinations:</strong> {{characters}}</li>
<li><strong>Calculations Per Second:</strong> {{calcsPerSecond}}</li>
<li><strong>Possible Combinations:</strong> {{possibleCombinations}}</li>
</ul>
<ul ng-show="checks">
<li ng-repeat="check in checks" class="{{check.type}}">
<h2 ng-bind-html-unsafe="check.title"></h2>
<p ng-bind-html-unsafe="check.wording"></p>
</li>
</ul>
我尝试了什么:
soup = BeautifulSoup(browser.page_source) #Example extract crack time with CSS selector
crack_time = soup.select('results')
print crack_time[0].text
答案 0 :(得分:0)
有点不清楚html中的实际时间,但看起来它位于<span>
class="main"
。其中有两个可以很容易地提取出来:
for x in soup.findAll("span",{"class":"main"}):
print x.text
给出:
{{time}}
Instantly
如果您想要对象中的所有文本,请尝试:
soup.get_text()
将以递归方式从对象及其子对象中提取所有文本。