编写正则表达式以在Java中提取信息

时间:2013-10-17 13:44:01

标签: java regex

我有两篇文章。

1)v1.0 - 80 s200 + 2013-10-17T05:59:59-0700 1TZY6R5HERP7SJRRYDYV 69.71.202.109 7802 41587 495307 30595 HTTP/1.1 POST /gp/ppd

2)access-1080.2013-10-17-05.us-online-cpp-portlet-live-1d-i-752c3b12.us-east-1.phnew.com.gz

我需要从他们那里获取这些数据 从我需要的第一个正则表达式开始: - 1TZY6R5HERP7SJRRYDYV。让我们调用此accessId。这总是由20个字符组成,是0-9和upperCase字母的数字组合[A-Z]

我尝试使用[A-Z0-9]{20}但没有运气。

Pattern p = Pattern.compile([A-Z0-9]{20});  
Matcher m = p.matcher(myString);

此外,我正在寻找与模式匹配的java API,如果匹配,则会给出结果模式

从第二部分我需要us-online-cpp-portlet-live-1d-i-752c3b12.us-east-1.phnew.com。我很难解决这个问题。

任何帮助都会有用。

2 个答案:

答案 0 :(得分:3)

您需要致电Matcher#find(),然后拨打Matcher#group()以获得匹配的结果:

Pattern p = Pattern.compile("[A-Z0-9]{20}");
Matcher m = p.matcher(myString);
String accessId = null;
if (m.find())
   accessId = m.group();

答案 1 :(得分:2)

您的代码存在一些问题 - 例如Pattern初始化中缺少双引号。

以下是您正在寻找的示例:

// text for 1st pattern
String text1 = "v1.0 - 80 s200 + 2013-10-17T05:59:59-0700 1TZY6R5HERP7SJRRYDYV 69.71.202.109 7802 41587 495307 30595 HTTP/1.1 POST /gp/ppd";
// text for 2nd pattern
String text2 = "access-1080.2013-10-17-05.us-online-cpp-portlet-live-1d-i-752c3b12.us-east-1.phnew.com.gz";
// 1st pattern - note that the "word" boundary separators are useless here, 
// but they might come in handy if you had alphanumeric Strings longer than 20 characters
Pattern accessIdPattern = Pattern.compile("\\b[A-Z0-9]{20}\\b");
Matcher m = accessIdPattern.matcher(text1);
while (m.find()) {
    System.out.println(m.group());
}
// this is trickier. I assume for your 2nd pattern you want something delimited on the
// left by a dot and starting with 2 lowercase characters, followed by a hyphen, 
// followed by a number of alnums, followed by ".com"
Pattern otherThingie = Pattern.compile("(?<=\\.)[a-z]{2}-[a-z0-9\\-.]+\\.com");
m = otherThingie.matcher(text2);
while (m.find()) {
    System.out.println(m.group());
}

输出:

1TZY6R5HERP7SJRRYDYV
us-online-cpp-portlet-live-1d-i-752c3b12.us-east-1.phnew.com