我正在解析一个HTML,它包含一堆我要选择的行。这是这些行的示例
<tr class="constantstring-randomvalue1-row" onmouseover="this.className='constantstring-light-row-cp-h'" onmouseout="this.className='constantstring-randomvalue1-row'" onclick="if(ignoreOnClick==false)window.location='find.ashx?cv3dsw'" valign="top">
<tr class="constantstring-randomvalue1-row" onmouseover="this.className='constantstring-light-row-cp-h'" onmouseout="this.className='constantstring-randomvalue1-row'" onclick="if(ignoreOnClick==false)window.location='find.ashx?cv3dsw'" valign="top">
<tr class="constantstring-randomvalue2-row-2" onmouseover="this.className='constantstring-light-row-cp-h'" onmouseout="this.className='constantstring-randomvalue2-row-2'" onclick="if(ignoreOnClick==false)window.location='find.ashx?cv3dsw'" valign="top">
<tr class="constantstring-randomvalue2-row-2" onmouseover="this.className='constantstring-light-row-cp-h'" onmouseout="this.className='constantstring-randomvalue2-row-2'" onclick="if(ignoreOnClick==false)window.location='find.ashx?cv3dsw'" valign="top">
我想做的是使用BeautifulSoup4
和find_all
,使用正则表达式find_all(re.compile(regext))
但是,问题是我无法提出一个好的正则表达式来选择我感兴趣的所有行。
我要以constantstring-
开头的所有行。我不在乎它是什么。正确的方法是什么,我应该使用re.compile
,如果这样,正确的regex
是什么?
答案 0 :(得分:1)
如果您想使用RE完成此操作,请执行以下操作,我添加了一个额外的行来演示它而不占用最后一行。
http://rextester.com/OSSFB8621
from bs4 import BeautifulSoup
import re
html ="""
<tr class="constantstring-randomvalue1-row" onmouseover="this.className='constantstring-light-row-cp-h'" onmouseout="this.className='constantstring-randomvalue1-row'" onclick="if(ignoreOnClick==false)window.location='find.ashx?cv3dsw'" valign="top">
<tr class="constantstring-randomvalue1-row" onmouseover="this.className='constantstring-light-row-cp-h'" onmouseout="this.className='constantstring-randomvalue1-row'" onclick="if(ignoreOnClick==false)window.location='find.ashx?cv3dsw'" valign="top">
<tr class="constantstring-randomvalue2-row-2" onmouseover="this.className='constantstring-light-row-cp-h'" onmouseout="this.className='constantstring-randomvalue2-row-2'" onclick="if(ignoreOnClick==false)window.location='find.ashx?cv3dsw'" valign="top">
<tr class="constantstring-randomvalue2-row-2" onmouseover="this.className='constantstring-light-row-cp-h'" onmouseout="this.className='constantstring-randomvalue2-row-2'" onclick="if(ignoreOnClick==false)window.location='find.ashx?cv3dsw'" valign="top">
<tr class="axcconstantstring-randomvalue2-row-2" onmouseover="this.className='constantstring-light-row-cp-h'" onmouseout="this.className='constantstring-randomvalue2-row-2'" onclick="if(ignoreOnClick==false)window.location='find.ashx?cv3dsw'" valign="top">
"""
bs = BeautifulSoup(html,'lxml')
for tr in bs.find_all("tr", {"class" : re.compile('^(constantstring)')}):
print(tr)
答案 1 :(得分:0)
您可以使用内置字符串方法代替正则表达式来执行同一任务。喜欢,
rows = soup.find_all('tr)'
selected_rows = [i for i in rows if str(i).startswith('tr class="constantstring-randomvalue')]
如果您错过str()
,则if
条件将失败。
希望这会有所帮助!干杯!