仅限字母字符的正则表达式 - Java

时间:2016-04-25 21:51:57

标签: java regex

对不起,我是Regex的新手,但到目前为止我用的任何正则表达式似乎无法达到以下目的。

我们对“单词”感兴趣(即该单词完全按字母顺序排列,仅包含大写,低级或混合大小写的字母。所有其他内容均被忽略)

我尝试使用的示例字符串如下:

要找到黄金票,你必须买一块巧克力:)查理的奶奶和爷爷希望他得到一张票,但他只有足够的钱购买1巴。我打印了5张票,但是我的Oompa-Loompa工人制作了100多万张酒吧:)

所以像Charlie's,Oompa-Loompa这样的单词和笑脸不应该包含在输出中。只是完全字母的单词。

我尝试过使用其他问题中的一些例子,比如这个here试图使用正则表达式,例如 ^ [a-zA-Z] +('[a-zA-Z] +)?$ 但不幸的是,正如我之前所说,我是Regex的新手,所以我不太清楚我在做什么。任何帮助,将不胜感激。

2 个答案:

答案 0 :(得分:4)

您可以使用:

words.split("[ ]+");

然后,对于该数组中的每个字符串,如果符合您的条件,则以下true

str.matches("[a-zA-Z]+");

答案 1 :(得分:4)

描述

此正则表达式将执行以下操作:

  • 假设单词完全由字母A-Z,大写和小写
  • 组成
  • 查找所有单词
  • 忽略包含非字母字符或符号的所有字符串
  • 假设要忽略句点或逗号之类的标点符号,但应捕获前一个单词。

正则表达式

(?<=\s|^)[a-zA-Z]*(?=[.,;:]?\s|$)

Regular expression visualization

解释

NODE                     EXPLANATION
----------------------------------------------------------------------
  (?<=                     look behind to see if there is:
----------------------------------------------------------------------
    \s                       whitespace (\n, \r, \t, \f, and " ")
----------------------------------------------------------------------
   |                        OR
----------------------------------------------------------------------
   ^                         start of the string
----------------------------------------------------------------------
  )                        end of look-behind
----------------------------------------------------------------------
  [a-zA-Z]*                any character of: 'a' to 'z', 'A' to 'Z'
                           (0 or more times (matching the most amount
                           possible))
----------------------------------------------------------------------
  (?=                      look ahead to see if there is:
----------------------------------------------------------------------
    [.,;:]?                  any character of: '.', ',', ';', ':'
                             (optional (matching the most amount
                             possible))
----------------------------------------------------------------------
    \s                       whitespace (\n, \r, \t, \f, and " ")
----------------------------------------------------------------------
   |                        OR
----------------------------------------------------------------------
    $                        before an optional \n, and the end of
                             the string
----------------------------------------------------------------------
  )                        end of look-ahead
----------------------------------------------------------------------

实施例

在线正则表达式演示

http://fiddle.re/65eqna

示例Java代码

import java.util.regex.Pattern;
import java.util.regex.Matcher;
class Module1{
  public static void main(String[] asd){
  String sourcestring = "source string to match with pattern";
  Pattern re = Pattern.compile("(?<=\\s|^)[a-zA-Z]*(?=[.,;:]?\\s|$)");
  Matcher m = re.matcher(sourcestring);
  int mIdx = 0;
    while (m.find()){
      for( int groupIdx = 0; groupIdx < m.groupCount()+1; groupIdx++ ){
        System.out.println( "[" + mIdx + "][" + groupIdx + "] = " + m.group(groupIdx));
      }
      mIdx++;
    }
  }
}

样本捕获

$matches Array:
(
    [0] => Array
        (
            [0] => To
            [1] => find
            [2] => the
            [3] => golden
            [4] => ticket
            [5] => you
            [6] => have
            [7] => to
            [8] => buy
            [9] => a
            [10] => bar
            [11] => of
            [12] => chocolate
            [13] => Granny
            [14] => and
            [15] => Grandad
            [16] => are
            [17] => hoping
            [18] => he
            [19] => gets
            [20] => a
            [21] => ticket
            [22] => but
            [23] => he
            [24] => only
            [25] => has
            [26] => enough
            [27] => money
            [28] => to
            [29] => buy
            [30] => bar
            [31] => I
            [32] => printed
            [33] => tickets
            [34] => but
            [35] => my
            [36] => workers
            [37] => made
            [38] => more
            [39] => than
            [40] => bars
        )

)