我正在抓取一个网页,除了re.compile()在传递给它的文本存在时返回空[]的部分之外,它工作得很好。这是我的刮刮代码
dob = soup.find(text = re.compile('Date of Birth')).findNext('td').text
print(dob)
father_name = soup.find(text = re.compile("Father's Name")).findNext('td').text
print(father_name)
mob_no_parent = soup.find(text = re.compile("Mobile Number")).findNext('td').text
print(mob_no_parent)
mob_no_student = soup.findAll(text = re.compile("Mobile Number(Student)"))
print(mob_no_student)
email = soup.find(text = re.compile("E - Mail Address")).findNext('td').text
print(email)
p_address = soup.find(text = re.compile("PermanentAddress")).findNext('td').text
print(p_address)
上述代码适用于除
之外的所有文本mob_no_student = soup.findAll(text = re.compile("Mobile Number(Student)"))
print(mob_no_student)
上面的一个返回[]
这是我的HTML代码
<td align="left" width="50%" class="inner_padding_even"> Registration No </td>
<td align="left" width="50%" class="inner_padding_even">CPT0000</td>
</tr>
<tr>
<td align="left" width="50%" class="inner_padding_odd"> Name of Candidate</td>
<td align="left" width="50%" class="inner_padding_odd"><font face=arial size=2>KKKKKKK B.</font></td>
</tr>
<tr>
<td align="left" class="inner_padding_even"> Date of Birth</td>
<td align="left" class="inner_padding_even">16.11.1900</td>
</tr>
<tr>
<td align="left" class="inner_padding_even"> Father's Name</td>
<td align="left" class="inner_padding_even">BBBBBBBB.</td>
</tr>
<tr>
<td align="left" class="inner_padding_even"> Mobile Number</font>(Parent)</td>
<td align="left" class="inner_padding_even">99999999999</td>
</tr>
<tr>
<td align="left" class="inner_padding_odd"> Mobile Number(Student)</td>
<td align="left" class="inner_padding_odd">9999999999</td>
</tr>
<tr>
<td align="left" class="inner_padding_even"> E - Mail Address</td>
<td align="left" class="inner_padding_even">keyansgm@gmail.com</td>
</tr>
<tr>
<td width="50%" align="left" class="inner_padding_even"> Permanent Address</td>
<td width="50%" align="left" class="inner_padding_even">Blah blah</td>
</tr>
我在这里缺少什么?
答案 0 :(得分:2)
在正则表达式中,您需要转义括号,否则它将引用一个组
试试这个
mob_no_student = soup.findAll(text = re.compile("Mobile Number\(Student\)"))