从字符串中删除字母数字字

时间:2014-06-17 05:17:04

标签: java regex string pattern-matching

我正在尝试从字符串中删除字母数字字。

 String[] sentenceArray= {"India123156 hel12lo 10000 cricket 21355 sport news 000Fifa"};
    for(String s: sentenceArray)
        {
            String finalResult = new String();
            String finalResult1 = new String();
            String str= s.toString();
            System.out.println("before regex : "+str);
            String regex = "(\\d?[,/%]?\\d|^[a-zA-Z0-9_]*)";
            finalResult1 = str.replaceAll(regex, " ");
            finalResult = finalResult1.trim().replaceAll(" +", " ");
            System.out.println("after regex : "+finalResult);
        }

输出:hel lo cricket体育新闻国际足联

但我要求的输出是:板球运动新闻

伙计们请帮忙.. 提前谢谢

2 个答案:

答案 0 :(得分:2)

要匹配您要排除的字词和以下空格字符,您可以在不区分大小写的模式(demo)中使用以下正则表达式:

\b(?=[a-z]*\d+)\w+\s*\b

在Java中,要替换它,您可以执行以下操作:

String replaced = your_original_string.replaceAll("(?i)\\b(?=[a-z]*\\d+[a-z]*)\\w+\\s*\\b", "");

Token-by-Token说明

\b                       # the boundary between a word char (\w) and
                         # something that is not a word char
(?=                      # look ahead to see if there is:
  [a-z]*                 #   any character of: 'a' to 'z' (0 or more
                         #   times (matching the most amount
                         #   possible))
  \d+                    #   digits (0-9) (1 or more times (matching
                         #   the most amount possible))
)                        # end of look-ahead
\w+                      # word characters (a-z, A-Z, 0-9, _) (1 or
                         # more times (matching the most amount
                         # possible))
\s*                      # whitespace (\n, \r, \t, \f, and " ") (0 or
                         # more times (matching the most amount
                         # possible))
\b                       # the boundary between a word char (\w) and
                         # something that is not a word char

答案 1 :(得分:2)

public static void main(String[] args) {
    String s = "India123156 hel12lo 10000 cricket 21355 sport news 000Fifa";
    // String s = "cricket abc";
    // cricket sport news
    System.out.println(s.replaceAll("\\b\\w+?[0-9]+\\w+?\\b", "").trim());

}

O / P:

cricket  sport news

Explaination :

\\b --> word boudry i.e, it marks the beginning and end of a word..
\\w+ -->one or more alphabets . 
\\w+?[0-9] --> Zero or one occurance of (one or more alphabets) followed by one or more digits.
\\w+?--> ending with  Zero or one occurance of (one or more alphabets) and marked by word boundry.
trim() --> removing leading and trailing whitespaces.