Question

我在程序中遇到一些问题，该程序是从Java中的html表中获取信息。要从每列中获取信息，我使用以下RegEx：

<td>([^<]*)</td>

这对我来说非常好。为了获取链接名，我使用：

<a[^>]*>(.*?)</a>

这也非常好。但有时我需要来自链接所在列的信息。因此我想将这些正则表达式与以下内容结合起来：

<td>([^<]*)</td>|<a[^>]*>(.*?)</a>

我认为它会像这样工作：

它得到<td>和</td>之间的所有内容
如果该东西是链接，它也只是链接名称

但这不起作用。我不是RegEx最好的，所以我需要帮助来结合这两个步骤。

非常感谢。

Answer 1

我使用的代码：

Pattern pattern = Pattern.compile("<td>([^<]*)</td>|<a[^>]*>(.*?)</a>");

String line = "Here are the lines saved from the HTML downloader";

Matcher matcher = pattern.matcher(line);
for (int startPoint = 0; matcher.find(startPoint); startPoint = matcher.end())
   {
        System.out.prinln(matcher.group(1));
   }

这只是一个片段 - 但这就是它的工作原理。（通常String保存在数组中）。

在Java中结合使用Regex

1 个答案: