Java Regex提取特定单词

时间:2015-06-08 07:13:19

标签: java regex

我正在努力提取'和'' a''''''&# 39;&安培; amp;'从一个文本块以及所有数字的存在。

我试图为此目的创建不同的正则表达式但未能获得准确的结果。

所有数字都被提取得很好但我无法通过正则表达式获取所有上述字符串。

我的基本正则表达式是

 Pattern p = Pattern.compile("^[0-9]");
然后我尝试了不同的组合,比如

 Pattern p = Pattern.compile("^[0-9](&)");
 Pattern p = Pattern.compile("^[0-9]+[&]");

获得上述字符串但没有用。

文字示例

System requirements: iOS 6.0 and Android (varies) &
Version used in this guide: 2.2.4 (iPhone), 13.1.2 (Android)

预期结果

 6.0,and,&,2.2.4,13.1.2

2 个答案:

答案 0 :(得分:1)

你无法接近你的“尝试”,我几乎感觉不好只是给你解决方案,但如果你真的“热衷于学习新事物”(正如你在SO资料中所说),看看在正则表达式教程。

alternationgroupingquantifiersanchors(/ word boundaries)的基本用法将解决您的问题。

(\b(?:a|an|and|the)\b|&|\d+(?:\.\d+)*)

说明:

NODE                     EXPLANATION
--------------------------------------------------------------------------------
  (                        group and capture to \1:
--------------------------------------------------------------------------------
    \b                       the boundary between a word char (\w)
                             and something that is not a word char
--------------------------------------------------------------------------------
    (?:                      group, but do not capture:
--------------------------------------------------------------------------------
      a                        'a'
--------------------------------------------------------------------------------
     |                        OR
--------------------------------------------------------------------------------
      an                       'an'
--------------------------------------------------------------------------------
     |                        OR
--------------------------------------------------------------------------------
      and                      'and'
--------------------------------------------------------------------------------
     |                        OR
--------------------------------------------------------------------------------
      the                      'the'
--------------------------------------------------------------------------------
    )                        end of grouping
--------------------------------------------------------------------------------
    \b                       the boundary between a word char (\w)
                             and something that is not a word char
--------------------------------------------------------------------------------
   |                        OR
--------------------------------------------------------------------------------
    &                    '&'
--------------------------------------------------------------------------------
   |                        OR
--------------------------------------------------------------------------------
    \d+                      digits (0-9) (1 or more times (matching
                             the most amount possible))
--------------------------------------------------------------------------------
    (?:                      group, but do not capture (0 or more
                             times (matching the most amount
                             possible)):
--------------------------------------------------------------------------------
      \.                       '.'
--------------------------------------------------------------------------------
      \d+                      digits (0-9) (1 or more times
                               (matching the most amount possible))
--------------------------------------------------------------------------------
    )*                       end of grouping
--------------------------------------------------------------------------------
  )                        end of \1

要在Java中使用,您必须每个\转义。

(\\b(?:a|an|and|the)\\b|&|\\d+(?:\\.\\d+)*)

答案 1 :(得分:0)

您可以使用以下正则表达式:

ID      L1  L2  Year    JR  FR  MR  AR  MYR JR  JLR AGR SR  OR  NR  DR  JA  FA  MA  AA  MYA JA  JLA AGA SA  OA  NA  DA
1234    89  65  2003    11  34  6   7   8   90  65  54  3   22  55  66  76  86  30  76  43  67  13  98  67  0   127 74
1234    45  76  2004    67  87  98  5   4   3   77  8   99  76  56  4   3   2   65  78  44  53  67  98  79  53  23  65

请参阅DEMO