我喜欢阅读pandas中的.html文件,请参阅下面的源代码htm。
<html>
<head>
<title>Output File</title>
</head>
<body>
<pre>
<span style='color:black'>-----------------------------------------------------------------------------------------------------------------------------------</span>
<span style='color:black'>| Study Case: Case A_Lines | Annex: / 1 |</span>
<span style='color:black'>-----------------------------------------------------------------------------------------------------------------------------------</span>
<span style='color:black'>| System Summary |</span>
<span style='color:black'>-----------------------------------------------------------------------------------------------------------------------------------</span>
<span style='color:black'>| System Average Interruption Frequency Index : SAIFI = 0.373016 1/Ca |</span>
<span style='color:black'>| Customer Average Interruption Frequency Index : CAIFI = 0.373016 1/Ca |</span>
<span style='color:black'>-----------------------------------------------------------------------------------------------------------------------------------</span>
<span style='color:black'></span>
</pre>
</body>
</html>
我想要阅读的最相关信息是索引和值表,如
SAIFI 0.373016 1/Ca
我尝试过多次选择直接阅读但失败了。
df = pd.read_html(path, match='=')
请帮忙!
答案 0 :(得分:0)
我尝试使用pandas
,但它返回了一个错误。你可以试试BeautifulSoap
吗?:
In [20]: from bs4 import BeautifulSoup
In [21]: f = BeautifulSoup(open("file.html"))
In [22]: f.findAll("span")[5].text.split()[-3]
Out[22]: u'0.373016'
当然,您可以改进我用于识别价值的方式。
由于