如何在StringTokenizer中使用正则表达式

时间:2012-02-26 07:28:33

标签: java regex web-crawler stringtokenizer

StringTokenizer st = new StringTokenizer(remaining, "\t\n\r\"'>#");

String strLink = st.nextToken();

字符串剩余的输入可以是以下之一:

  1. "http://somegreatsite.com">Link Name</a>is a link to another nifty site<H1>This is a Header</H1><H2>This is a Medium Header</H2>Send me mail at <a href="mailto:support@yourcompany.com">support@yourcompany.com</a>.<P> This is a new paragraph!<P> <B>This is a new paragraph!</B><BR> <B><I>This is a new sentence without a paragraph break, in bold italics.</I></B><HR></BODY></HTML>

  2. "mailto:support@yourcompany.com">support@yourcompany.com</a>.<P> This is a new paragraph!<P> <B>This is a new paragraph!</B><BR> <B><I>This is a new sentence without a paragraph break, in bold italics.</I></B><HR></BODY></HTML>

  3. 我知道StringTokenizer构造函数会使用正则表达式将字符串*remaining*拆分为标记。 但我无法理解这里使用的正则表达式。

    根据字符串strLink中的值,*remaining*将具有以下值:

    1。http://somegreatsite.com
    2。mailto:support@yourcompany.com

    请帮助我理解上面代码中使用的正则表达式。

1 个答案:

答案 0 :(得分:3)

这些字符\t\n\r\"'>#不是正则表达式,而是分隔符。例如,您可以在Pattern类中看到特殊字符的含义。

\t - The tab character
\n - The newline (line feed) character
\r - The carriage-return character
\" - this is just a double quote
', >, # - other symbols