Question

我第一次在Python工作，我使用Mechanize搜索一个网站和BeautifulSoup来选择一个特定的div，现在我正试图用正则表达式抓住一个特定的句子。这是汤对象的内容;

    <div id="results">
   <table cellspacing="0" width="100%">
     <tr>
       <th align="left" valign="middle" width="32%">Physician Name, (CPSO#)</th>
       <th align="left" valign="middle" width="36%">Primary Practice Location</th>
       <!-- <th width="16%" align="center" valign="middle">Accepting New Patients?</th> --> 
       <th align="center" valign="middle" width="32%">Disciplinary Info  &amp; Restrictions</th>
     </tr>

    <tr>
        <td>
            <a class="doctor" href="details.aspx?view=1&amp;id= 85956">Hull, Christopher Merritt </a> (#85956)
        </td>
        <td>Four Counties Medical Clinic<br/>1824 Concessions Dr<br/>Newbury ON  N0L 1Z0<br/>Phone: (519) 693-0350<br/>Fax: (519) 693-0083</td>
        <!-- <td></td> --> 
        <td align="center"></td>
    </tr>
  </table>
</div>

（感谢格式化方面的帮助）

我的正则表达式是获取文本“Hull，Christopher Merritt”;

patFinderName = re.compile('<a class="doctor" href="details.aspx?view=1&amp;id= 85956">(.*) </a>')

它一直空着，我无法弄清楚为什么，任何人都有任何想法？

感谢您的回答，我已将其更改为;

patFinderName = re.compile('<a class="doctor" href=".*">(.*) </a>')

现在效果很好。

Answer 1

?是正则表达式中的魔术标记，表示零或前一个原子之一。如你想要一个文字问号符号，你需要逃避它。

Answer 2

你应该逃脱正则表达式中的?：

In [8]: re.findall('<a class="doctor" href="details.aspx\?view=1&amp;id= 85956">(.*)</a>', text)
Out[8]: ['Hull, Christopher Merritt ']

正则表达式在Python中没有返回任何内容

2 个答案: