Question

我有一个包含名称和网址的表格，如下所示：

<tr>
  <td>name1</td>
  <td>www.url.com</td> </tr>
<tr>
  <td>name2</td>
<td>www.url2.com</td> </tr>

我想在表格中选择所有URL-tabledata。我试过了：

<td>w{3,3}.*(</td>){1,1}

但是这个表达式并没有在第一个</td>处“停止”。我明白了：

<td>www.url.com</td> </tr>
    <tr>
    <td>name2</td>
    <td>www.url2.com</td>

结果。我的错误在哪里？

Answer 1

有几种方法可以匹配网址。我会尽量满足您的需求：纠正您的正则表达式。你可以改用这个：

<td>w{3}.*?</td>

说明：

<td>          # this part is ok
w{3,3}        # the notation {3} is simpler for this case and has the same effect
.*            # the main problem: you have to use .*? to make .* non-greedy, that
                is, to make it match as little as possible
(</td>){1,1}  # same as second line. As the number is 1, {1} is not needed

Answer 2

你的正则表达式可以

\b(https?|ftp|file)://[-A-Za-z0-9+&@#/%?=~_|!:,.;]*[-A-Za-z0-9+&@#/%=~_|]

或

"((((ht{2}ps?://)?)((w{3}\\.)?))?)[^.&&[a-zA-Z0-9]][a-zA-Z0-9.-]+[^.&&[a-zA-Z0-9]](\\.[a-zA-Z]{2,3})"

请参阅此链接 - What is the best regular expression to check if a string is a valid URL?。有很多答案。

使用正则表达式在HTML表中选择URL

2 个答案: