字符串操作:拆分分隔数据

时间:2010-06-15 09:50:02

标签: java string

我需要从星号分隔的数据中拆分一些信息。

数据格式:

NAME*ADRESS LINE1*ADDRESS LINE2

规则:

1. Name should be always present
2. Address Line 1 and 2 might not be
3. There should be always three asterisks.

样品:

MR JONES A ORTEGA*ADDRESS 1*ADDRESS2*

Name: MR JONES A ORTEGA
Address Line1: ADDRESS 1
Address Line2: ADDRESS 2

A PAUL*ADDR1**
Name: A PAUL
Address Line1: ADDR1
Address Line2: Not Given

我的算法是:

1. Iterate through the characters in the line
2. Store all chars in a temp variables until first * is found. Reject the data if no char is found before first occurence of asterisk. If some chars found, use it as the name.
3. Same as step 2 for finding address line 1 and 2 except that this won't reject the data if no char is found

我的算法看起来很难看。代码看起来更丑陋。使用// *分割不起作用,因为如果数据是*地址1 *地址2,则名称可以替换为地址行1。有什么建议吗?

修改

尝试使用不包括引号的数据“-MS DEBBIE GREEN * 1036 PINEWOOD CRES **”

5 个答案:

答案 0 :(得分:2)

您可以按如下方式使用String[] split(String regex, int limit)

    String[] tests = {
        "NAME*ADRESS LINE1*ADDRESS LINE2*",
        "NAME*ADRESS LINE1**",
        "NAME**ADDRESS LINE2*",
        "NAME***",
        "*ADDRESS LINE1*ADDRESS LINE2*",
        "*ADDRESS LINE1**",
        "**ADDRESS LINE2*",
        "***",
        "-MS DEBBIE GREEN*1036 PINEWOOD CRES**",
    };
    for (String test : tests) {
        test = test.substring(0, test.length() - 1);
        String[] parts = test.split("\\*", 3);
        System.out.printf(
            "%s%n  Name: %s%n  Address Line1: %s%n  Address Line2: %s%n%n",
            test, parts[0], parts[1], parts[2]
        );
    }

打印(as seen on ideone.com):

NAME*ADRESS LINE1*ADDRESS LINE2*
  Name: NAME
  Address Line1: ADRESS LINE1
  Address Line2: ADDRESS LINE2

NAME*ADRESS LINE1**
  Name: NAME
  Address Line1: ADRESS LINE1
  Address Line2: 

NAME**ADDRESS LINE2*
  Name: NAME
  Address Line1: 
  Address Line2: ADDRESS LINE2

NAME***
  Name: NAME
  Address Line1: 
  Address Line2: 

*ADDRESS LINE1*ADDRESS LINE2*
  Name: 
  Address Line1: ADDRESS LINE1
  Address Line2: ADDRESS LINE2

*ADDRESS LINE1**
  Name: 
  Address Line1: ADDRESS LINE1
  Address Line2: 

**ADDRESS LINE2*
  Name: 
  Address Line1: 
  Address Line2: ADDRESS LINE2

***
  Name: 
  Address Line1: 
  Address Line2: 

-MS DEBBIE GREEN*1036 PINEWOOD CRES**
  Name: -MS DEBBIE GREEN
  Address Line1: 1036 PINEWOOD CRES
  Address Line2: 

"\\*"的原因是因为split采用正则表达式,而*是正则表达式字符,并且由于您希望它按字面意思表示,因此需要使用一个\。由于\本身是一个Java字符串转义字符,要在字符串中获取\,您需要将其加倍。

limit 3的原因是因为您希望数组包含3个部分,包括尾随空字符串。默认情况下,limit - 少split会丢弃尾随空字符串。

在执行*之前,手动丢弃最后一个split

答案 1 :(得分:0)

String myLine = "name*addr1*addr2*"
String[] parts = myLine.split('\\*',4);
for (String s : parts) {
    System.out.println(s);
}

输出:

name
addr1
addr2
(empty string)

如果您在"**addr2*"上进行拆分 - 您将获得带有“”,“”,“addr2”的数组。所以我不明白为什么你不能使用拆分。

此外,如果您拆分"***" - 您将获得一个包含4个空字符串的4元素数组。

在这里,您将获得一个示例,尝试运行此代码:

public void testStrings() {
    String line = "part0***part3*part4****part8*";
    String[] parts = line.split("\\*");
    for (int i=0;i<parts.length;i++) {
        System.out.println(String.format("parts[%d]: '%s'",i, parts[i]));
    }
}

结果将是:

parts[0]: 'part0'
parts[1]: ''
parts[2]: ''
parts[3]: 'part3'
parts[4]: 'part4'
parts[5]: ''
parts[6]: ''
parts[7]: ''
parts[8]: 'part8'

答案 2 :(得分:0)

您可以使用正则表达式执行此操作。例如:

String myInput="MR JONES A ORTEGA*ADDRESS 1*ADDRESS2*";

Pattern pattern =  Pattern.compile("([^*]+)\\*([^*]*)\\*([^*]*)\\*");
Matcher matcher = pattern.matcher(myInput);

if (matcher.matches()) {
    String myName = matcher.group(1);
    String myAddress1 = matcher.group(2);
    String myAddress2 = matcher.group(3);
    // ...
} else {
    // input does not match the pre-requisites
}

答案 3 :(得分:0)

完整的解决方案,使用扫描仪和正则表达式从文件读取:

import java.io.*;
import java.util.Scanner;
import java.util.regex.Pattern;

public class Test {
    public static void main(String[] args) throws FileNotFoundException {
        Scanner s = new Scanner(new File("data.txt"));
        Pattern p = Pattern.compile("([^\\*]+)\\*([^\\*]*)\\*([^\\*]*)\\*");

        while (s.hasNextLine()) {
            if (s.findInLine(p) == null) {
                s.nextLine();
                continue;
            }

            System.out.println("Name: " + s.match().group(1));
            System.out.println("Addr1: " + s.match().group(2));
            System.out.println("Addr2: " + s.match().group(3));
            System.out.println();
        }
    }
}

输入文件:

MR JONES A ORTEGA*ADDRESS 1*ADDRESS2*
A PAUL*ADDR1**
*No name*Addr 2*
My Name*Some Addr*Some more addr*

<强>输出:

Name: MR JONES A ORTEGA
Addr1: ADDRESS 1
Addr2: ADDRESS2

Name: A PAUL
Addr1: ADDR1
Addr2: 

Name: My Name
Addr1: Some Addr
Addr2: Some more addr

请注意,没有名称的行不匹配(根据Rule 1: Name should be always present)。如果您仍想处理这些行,只需将正则表达式中的+更改为*

正则表达式([^\\*]*)\\*可以读作:“除星号外的任何内容,零次或多次,后跟星号。”

答案 4 :(得分:-1)

yourString.split("\\*");应该为您提供一个名称为address1和address2的数组,其中adress1和address2可以为空Srings。更多信息:here