我是Python的新手。我想在html文件上计算一些单词或表达式。例如,我有一段带有源代码的html如下:
<div style="line-height:120%;text-align:justify;text-indent:24px;font-size:10.5pt;">
<font style="font-family:inherit;font-size:10.5pt;font-style:italic;font-weight:bold;">2013 vs. 2012  </font>
<font style="font-family:inherit;font-size:10.5pt;">During 2013, the Company recognized a decommissioning charge of $117 million and a restoration liability of $50 million, partially offset by the 2013 reversal of the $56 million tax indemnification liability associated with the 2006 sale of the Company’s Canadian subsidiary.</font></div>
我想计算多少次&#34;责任&#34;出现在片中。以下是我的代码,它不起作用:
import os
from bs4 import BeautifulSoup
lst=os.listdir("C:/html/")
for x in lst:
print (x)
html = open ("C:/html/"+x,'rb')
bsobj = BeautifulSoup(html,"html.parser")
metricslist = bsobj.findAll(div.string ='liability')
print(len(metricslist))
我知道bsobj.findAll(div.string =&#39;责任&#39;)是非常错误的,但不知道代码应该是什么。任何帮助将不胜感激!
答案 0 :(得分:0)
使用find()
或find_all()
时,您可以对元素的文字应用部分字符串匹配:
soup.find(text=lambda text: text and "liability" in text)
或者,可以使用regular expression pattern代替function:
soup.find(text=re.compile(r"\bliability\b")