使用起始字符串和结束字符串从长字符串中提取子字符串?

时间:2014-11-02 19:22:20

标签: java regex string substring

我有这个长字符串(它是一个长的连续字符串):

Home address H.NO- 12 SECTOR- 12 GAUTAM BUDH NAGAR NOIDA- 121212, UTTAR PRADESH INDIA +911112121212 Last Updated: 12-JUN-12 Semester/Term-time Accommodation Type: Hall of residence (private provider) Semester/Term-time address A121A SOME APPARTMENT SOME LANE CITY COUNTY OX3 7FJ +91 1212121212 Last Updated: 12-SEP-12 Mobile Telephone Number : 01212121212

如果查看上面的字符串,可以生成以下模式:

<home_address_text><space><the_address><space><last_updated_text><last_updated_date><space><accomodation_t‌​ype_text><accomodation_type><space><semester_time_address_text><semester_time_address><space>last_updated_text><last_updated_date><space><mobile_number_text><mobile_number>

我想提取此字符串的特定部分,例如: 1. H.NO- 12 SECTOR- 12 GAUTAM BUDH NAGAR NOIDA- 121212, UTTAR PRADESH INDIA 2. Hall of residence (private provider) 3. A121A SOME APARTMENT SOMELANE CITY COUNTY OX3 7FJ 4. 01212121212

这个信息是可变的,所以它因人而异,所以我不能只计算长度并使用子串来提取它,因为整个字符串的长度&amp;我要提取的部分是可变的。

如上所述,如何使用Java提取字符串的特定部分?我很久以来一直在寻找方法但却无法找到方法。任何帮助将非常感谢

3 个答案:

答案 0 :(得分:0)

利用http://www.tutorialspoint.com/java/java_regular_expressions.htm中的示例 我想你会想要使用正则表达式。类似的东西:

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class RegexMatches
{
    public static void main( String args[] ){

      // String to be scanned to find the pattern.
      String line = "Home address H.NO- 12 SECTOR- 12 GAUTAM BUDH NAGAR NOIDA- 121212, UTTAR PRADESH INDIA +911112121212 Last Updated: 12-JUN-12 Semester/Term-time Accommodation Type: Hall of residence (private provider) Semester/Term-time address A121A SOME APPARTMENT SOME LANE CITY COUNTY OX3 7FJ +91 1212121212 Last Updated: 12-SEP-12 Mobile Telephone Number : 01212121212";
      String pattern = "Home address (.*) Last Updated:";

      // Create a Pattern object
      Pattern r = Pattern.compile(pattern);

      // Now create matcher object.
      Matcher m = r.matcher(line);
      if (m.find( )) {
         System.out.println("Found value: " + m.group(0) );
      } else {
         System.out.println("NO MATCH");
      }
   }
}

答案 1 :(得分:0)

根据您的(单个)示例,这对我有用。学习使用reluctant modifiers表示正则表达式。在这种情况下,他们会帮助你很多。

例如,要获取与第一部分匹配的字符串:"Home address (.+?) \+\d+ Last Updated:此正则表达式不会跳过&#34; Last Updated&#34;字符串或&#34; + dd&#34; (数字)我们不想要。正则表达式&#34;(。+?)&#34;是不情愿的(不是贪婪的),不会跳过+号或数字,让它们与表达的其余部分相匹配。

您可以使用它来匹配静态文本包围的正则表达式中的子字符串。在这里,我使用捕获组来查找我想要的文本。 (捕获组是括号中的部分。)

class Goofy
{

   public static void main( String[] args )
   {
      final String input
              = "Home address H.NO- 12 SECTOR- 12 GAUTAM BUDH NAGAR " +
              "NOIDA- 121212, UTTAR PRADESH INDIA +911112121212 " +
              "Last Updated: 12-JUN-12 Semester/Term-time " +
              "Accommodation Type: Hall of residence (private " +
              "provider) Semester/Term-time address A121A SOME " +
              "APPARTMENT SOME LANE CITY COUNTY OX3 7FJ +91 " +
              "1212121212 Last Updated: 12-SEP-12 Mobile Telephone " +
              "Number : 01212121212";

      final String regex = "Home address (.+?) \\+\\d+ Last Updated: " +
              "\\S+ Semester/Term-time Accommodation Type: (.+?) " +
              "Semester/Term-time address (.+?) \\+\\d\\d \\d+ " +
              "Last Updated.+ Number : (\\d+)";

      Pattern pattern = Pattern.compile( regex );
      Matcher matcher = pattern.matcher( input );
      if( matcher.find() ) {
         System.out.println("Found: "+matcher.group() );
         for( int i = 1; i <= matcher.groupCount(); i++ ) {
            System.out.println( "   Match " + i + ": " + matcher.group( i ));
         }
      }
   }
}

答案 2 :(得分:0)

Home\s+address\s+(.*?)Last\s+Updated(.*?)Accommodation\s+Type(.*?)Semester\/Term-time(.*?)Last\s+Updated(.*)Mobile\s+Telephone\s+Number\s*:\s*(\d+)

试试这个。抓住捕获。参见演示。

http://regex101.com/r/jI8lV7/7