使用正则表达式在HTML表中选择URL

时间:2013-06-29 10:51:04

标签: regex url html-table

我有一个包含名称和网址的表格,如下所示:

<tr>
  <td>name1</td>
  <td>www.url.com</td> </tr>
<tr>
  <td>name2</td>
<td>www.url2.com</td> </tr>

我想在表格中选择所有URL-tabledata。 我试过了:

<td>w{3,3}.*(</td>){1,1}

但是这个表达式并没有在第一个</td>处“停止”。我明白了:

<td>www.url.com</td> </tr>
    <tr>
    <td>name2</td>
    <td>www.url2.com</td>

结果。我的错误在哪里?

2 个答案:

答案 0 :(得分:1)

有几种方法可以匹配网址。我会尽量满足您的需求:纠正您的正则表达式。你可以改用这个:

<td>w{3}.*?</td>

说明:

<td>          # this part is ok
w{3,3}        # the notation {3} is simpler for this case and has the same effect
.*            # the main problem: you have to use .*? to make .* non-greedy, that
                is, to make it match as little as possible
(</td>){1,1}  # same as second line. As the number is 1, {1} is not needed

答案 1 :(得分:0)

你的正则表达式可以

\b(https?|ftp|file)://[-A-Za-z0-9+&@#/%?=~_|!:,.;]*[-A-Za-z0-9+&@#/%=~_|]

"((((ht{2}ps?://)?)((w{3}\\.)?))?)[^.&&[a-zA-Z0-9]][a-zA-Z0-9.-]+[^.&&[a-zA-Z0-9]](\\.[a-zA-Z]{2,3})"

请参阅此链接 - What is the best regular expression to check if a string is a valid URL?。有很多答案。