我试图解析AWR报告以获取长期运行的SQL信息,该报告具有40多个表,其中所有表具有相同的类但具有不同的摘要。为了进行解析,Python上的BS4能够找到多个表,但是一个包含所有SQL信息的表的摘要带有换行符和空格,如下所示:
AWR文件中的HTML标记:
<table border="0" class="tdiff" summary="This table displays the text of the SQL statements which have been
referred to in the report">
<tbody><tr><th class="awrbg" scope="col">SQL Id
我尝试使用BS4 find()
定位此表,但是每次都失败。任何帮助将不胜感激。
from bs4 import BeautifulSoup as BS4
awrFile='/XXXXXXXXXXXXXXXXXXX/test/XXXXXXXXXXDB69-1.html'
f_awr = open(awrFile, 'r')
soup = BS4(f_awr, 'html.parser')
sqlTextInfoTable = soup.find('table', {'summary':'This table displays the text of the SQL statements which have been referred to in the report'})
print(sqlTextInfoTable)
打印None
。
答案 0 :(得分:0)
您能只使用熊猫和.read_html()
吗,因为它带有<table>
标签?
html = '''<table border="0" class="tdiff" summary="This table displays the text of the SQL statements which have been
referred to in the report">
<tbody><tr><th class="awrbg" scope="col">SQL Id'''
import pandas as pd
table = pd.read_html(html)
sqlTextInfoTable = table[0]
那就这样做吧
import pandas as pd
awrFile='/XXXXXXXXXXXXXXXXXXX/test/XXXXXXXXXXDB69-1.html'
f_awr = open(awrFile, 'r')
table = pd.read_html(f_awr)
sqlTextInfoTable = table[0]
输出:
print (sqlTextInfoTable)
0
0 SQL Id
答案 1 :(得分:0)
您可以find_all()
表并像这样遍历表...
import pandas as pd
awrFile='/XXXXXXXXXXXXXXXXXXX/test/XXXXXXXXXXDB69-1.html'
f_awr = open(awrFile, 'r')
soup = BS4(f_awr, 'html.parser')
for table in soup.find_all('table'):
df = pd.read_html(str(table))
print(df)
答案 2 :(得分:0)
您也许可以使用CSS属性=值选择器组合来匹配子字符串。在这里,我使用^
(以运算符开头)。您还可以使用*
(包含)运算符。
matches = soup.select("table[summary^='this table displays the text of the SQL statements which have been']")
答案 3 :(得分:0)
使用re
搜索summary属性的特定文本。
from bs4 import BeautifulSoup
import re
data='''<table border="0" class="tdiff" summary="This table displays the text of the SQL statements which have been
referred to in the report">
<tbody><tr><th class="awrbg" scope="col">SQL Id'''
soup=BeautifulSoup(data,'html.parser')
sqlTextInfoTable =soup.find('table', summary=re.compile('This table displays the text of the SQL statements'))
print(sqlTextInfoTable)
OR
from bs4 import BeautifulSoup
import re
data='''<table border="0" class="tdiff" summary="This table displays the text of the SQL statements which have been
referred to in the report">
<tbody><tr><th class="awrbg" scope="col">SQL Id'''
soup=BeautifulSoup(data,'html.parser')
sqlTextInfoTable =soup.find('table', summary=re.compile('referred to in the report'))
print(sqlTextInfoTable)
输出:
<table border="0" class="tdiff" summary="This table displays the text of the SQL statements which have been
referred to in the report">
<tbody><tr><th class="awrbg" scope="col">SQL Id</th></tr></tbody></table>