将字符串拆分为子数组

时间:2020-06-09 08:12:53

标签: java scala

我有一个像这样的字符串:

OG=ACC-0000000009| AMBORFFA KIRI|P.O.BOX 1FAF6GPO,GPO,FG/FFERER      OB=XXXX-XXCC|ABC|14332 X HWay|Vica |MNSJD      IS=BIC-dfsgdf|asas nduf|142 ERRET ERT RET|ERTERT Island|ERTERT     BF=ACC-0000013417711DD028|534 DFG ION|ONE DALLAERRS CENTER RR N.|     ERTERT, SUITE 1300, DRRALLAS,|Pb, 75201/ PBB      DT=GREAT CHART|0000FGHFGGL028434  

在OG,OB,IS等值之间没有定界符。我想根据'='大致拆分数组,以便OG,OB ...字段包含在结果拆分中。我需要为子字段进一步处理这些字段。

4 个答案:

答案 0 :(得分:1)

像这样? (标量代码)

val str =
  "OG=ACC-0000000009| AMBORFFA KIRI|P.O.BOX 1FAF6GPO,GPO,FG/FFERER      OB=XXXX-XXCC|ABC|14332 X HWay|Vica |MNSJD      IS=BIC-dfsgdf|asas nduf|142 ERRET ERT RET|ERTERT Island|ERTERT     BF=ACC-0000013417711DD028|534 DFG ION|ONE DALLAERRS CENTER RR N.|     ERTERT, SUITE 1300, DRRALLAS,|Pb, 75201/ PBB      DT=GREAT CHART|0000FGHFGGL028434  "

str.split("(?=\\S\\S=)")
   .foldLeft(Map.empty[String,Array[String]]){
     case (m,s) => m+(s.take(2) -> s.drop(3).split("\\|"))
   }
//res0: Map[String,Array[String]] = 
// HashMap(OG -> Array(ACC-0000000009, " AMBORFFA KIRI", "P.O.BOX 1FAF6GPO,GPO,FG/FFERER      ")
//       , OB -> Array(XXXX-XXCC, ABC, 14332 X HWay, "Vica ", "MNSJD      ")
//       , DT -> Array(GREAT CHART, "0000FGHFGGL028434  ")
//       , IS -> Array(BIC-dfsgdf, asas nduf, 142 ERRET ERT RET, ERTERT Island, "ERTERT     ")
//       , BF -> Array(ACC-0000013417711DD028, 534 DFG ION, ONE DALLAERRS CENTER RR N., "     ERTERT, SUITE 1300, DRRALLAS,", "Pb, 75201/ PBB      "))

更新:每个注释都添加了新要求。

val str =
  "OG=ACC-0000000009| AMBORFFA KIRI|P.O.BOX 1FAF6GPO,GPO,FG/FFERER Transaction Amount= 1223|546SD|376KL OB=XXXX-XXCC|ABC|14332 X HWay|Vica |MNSJD      IS=BIC-dfsgdf|asas nduf|142 ERRET ERT RET|ERTERT Island|ERTERT     BF=ACC-0000013417711DD028|534 DFG ION|ONE DALLAERRS CENTER RR N.|     ERTERT, SUITE 1300, DRRALLAS,|Pb, 75201/ PBB      DT=GREAT CHART|0000FGHFGGL028434  "

str.split(raw"\b(?=Transaction Amount=|\S\S=)")
   .foldLeft(Map.empty[String,Array[String]]){
     case (m,s) => val (k,v) = s.splitAt(s.indexOf("="))
                   m + (k -> v.tail.split("\\|"))
   }
//HashMap(OG -> Array(ACC-0000000009, " AMBORFFA KIRI", "P.O.BOX 1FAF6GPO,GPO,FG/FFERER ")
//      , OB -> Array(XXXX-XXCC, ABC, 14332 X HWay, "Vica ", "MNSJD      ")
//      , Transaction Amount -> Array(" 1223", 546SD, "376KL ")
//      , DT -> Array(GREAT CHART, "0000FGHFGGL028434  ")
//      , IS -> Array(BIC-dfsgdf, asas nduf, 142 ERRET ERT RET, ERTERT Island, "ERTERT     ")
//      , BF -> Array(ACC-0000013417711DD028, 534 DFG ION, ONE DALLAERRS CENTER RR N., "     ERTERT, SUITE 1300, DRRALLAS,", "Pb, 75201/ PBB      "))

答案 1 :(得分:1)

正则表达式可能是解决方案之一。但是我建议尽可能使用定界符。

这是我的解决方案,但不确定在所有情况下是否都可以使用

public static void main(String[] args){

    String text = "OG=ACC-0000000009| AMBORFFA KIRI|P.O.BOX 1FAF6GPO,GPO,FG/FFERER      OB=XXXX-XXCC|ABC|14332 X HWay|Vica |MNSJD      IS=BIC-dfsgdf|asas nduf|142 ERRET ERT RET|ERTERT Island|ERTERT     BF=ACC-0000013417711DD028|534 DFG ION|ONE DALLAERRS CENTER RR N.|     ERTERT, SUITE 1300, DRRALLAS,|Pb, 75201/ PBB      DT=GREAT CHART|0000FGHFGGL028434";

    //Regex for field
    String regexField = "(?<field>[A-Z]+)(:?[=])";

    Pattern pattern = Pattern.compile(regexField);
    Matcher matcher = pattern.matcher(text);

    //extract fields names
    List<String> fields = new ArrayList<>();
    while(matcher.find()){
        fields.add(matcher.group("field"));
    }

    //extract values using split and regex for fields
    List<String> values = Arrays.stream(text.split(regexField))
                                .map(String::trim)
                                .filter(e -> !e.isEmpty())
                                .collect(Collectors.toList());

    //group fields and values
    Map<String, String> data = new HashMap<>();
    if(fields.size() == values.size()){

        for(int i = 0; i < fields.size(); i++){
            data.put(fields.get(i), values.get(i));
        }

    }else{

        System.out.println("Size are different. Something is not good.");

    }

    data.forEach((k, v) -> System.out.println(k + " -> " + v));

}

答案 2 :(得分:0)

<div class="text">MnO<sub>2</sub></div>
<div class="arrow">&#x27F6;</div>

答案 3 :(得分:0)

这是Scala regex解决方案:

val data = "OG=ACC-0000000009| AMBORFFA KIRI|P.O.BOX 1FAF6GPO,GPO,FG/FFERER..."

data.split("""(?=\w\w=)""")

这使用超前模式在后面紧跟两个单词字符和一个=符号的点处分割数据。