Question

我有一个简单的格式化字符串：

double d = 12.348678;
int i = 9876;
String s = "ABCD";
System.out.printf("%08.2f%5s%09d", d, s, i);

// %08.2f = '12.348678' -> '00012,35'
// %5s = 'ABCD' -> ' ABCD'
// %09d = '9876' -> '000009876'
// %08.2f%5s%09d = '00012,35 ABCD000009876'

当我知道模式：%08.2f%5s%09d和字符串：00012,35 ABCD000009876时：我可以＆＃34; unformat＆＃34;这个字符串在某种程度上？

例如。预期的结果就像3个令牌：＆＃39; 00012,35＆＃39;，＆＃39; ABCD＆＃39;，＆＃39; 000009876＆＃39;

Answer 1

这是针对您的模式的。 formattring的一般解析器（因为我们称之为unformatting就是解析）看起来会有很大不同。

public class Unformat {

    public static Integer getWidth(Pattern pattern, String format) {
        Matcher matcher = pattern.matcher(format);
        if (matcher.find()) {
            return Integer.valueOf(matcher.group(1));
        }
        return null;
    }

    public static String getResult(Pattern p, String format, String formatted,
            Integer start, Integer width) {
        width = getWidth(p, format);
        if (width != null) {
            String result = formatted.substring(start, start + width);
            start += width;
            return result;
        }
        return null;
    }

    public static void main(String[] args) {
        String format = "%08.2f%5s%09d";
        String formatted = "00012.35 ABCD000009876";
        String[] formats = format.split("%");

        List<String> result = new ArrayList<String>();
        Integer start = 0;
        Integer width = 0;

        for (int j = 1; j < formats.length; j++) {
            if (formats[j].endsWith("f")) {
                Pattern p = Pattern.compile(".*([0-9])+\\..*f");
                result.add(getResult(p, formats[j], formatted, start, width));
            } else if (formats[j].endsWith("s")) {
                Pattern p = Pattern.compile("([0-9])s");
                result.add(getResult(p, formats[j], formatted, start, width));
            } else if (formats[j].endsWith("d")) {
                Pattern p = Pattern.compile("([0-9])d");
                result.add(getResult(p, formats[j], formatted, start, width));
            }
        }
        System.out.println(result);
    }

}

Answer 2

根据"%08.2f%5s%09d"的输出格式判断，它似乎与此模式相当

"([0-9]{5,}[\\.|,][0-9]{2,})(.{5,})([0-9]{9,})"

尝试以下方法：

public static void main(String[] args) {
    String data = "00012,35 ABCD000009876";
    Matcher matcher = Pattern.compile("([0-9]{5,}[\\.|,][0-9]{2,})(.{5,})([0-9]{9,})").matcher(data);

    List<String> matches = new ArrayList<>();
    if (matcher.matches()) {
        for (int i = 1; i <= matcher.groupCount(); i++) {
            matches.add(matcher.group(i));
        }
    }

    System.out.println(matches);
}

结果：

[00012,35,  ABCD, 000009876]

更新

在看到评论之后，这里是一个不使用RegularExpressions的通用示例，而不是复制@bpgergo（通过RegularExpressions方法给你+1）。如果格式超出数据宽度，还会添加一些逻辑。

public static void main(String[] args) {
    String data = "00012,35 ABCD000009876";
    // Format exceeds width of data
    String format = "%08.2f%5s%09d%9s";
    String[] formatPieces = format.replaceFirst("^%", "").split("%");

    List<String> matches = new ArrayList();

    int index = 0;
    for (String formatPiece : formatPieces) {   
        // Remove any argument indexes or flags 
        formatPiece = formatPiece.replaceAll("^([0-9]+\\$)|[\\+|-|,|<]", "");

        int length = 0;
        switch (formatPiece.charAt(formatPiece.length() - 1)) {
            case 'f':
                if (formatPiece.contains(".")) {
                    length = Integer.parseInt(formatPiece.split("\\.")[0]);
                } else {
                    length = Integer.parseInt(formatPiece.substring(0, formatPiece.length() - 1));
                }
                break;
            case 's':
                length = Integer.parseInt(formatPiece.substring(0, formatPiece.length() - 1));
                break;
            case 'd':
                length = Integer.parseInt(formatPiece.substring(0, formatPiece.length() - 1));
                break;
        }

        if (index + length < data.length()) {                
            matches.add(data.substring(index, index + length));
        } else {
            // We've reached the end of the data and need to break from the loop
            matches.add(data.substring(index));
            break;
        }
        index += length;
    }
    System.out.println(matches);
}

结果：

[00012,35,  ABCD, 000009876]

Answer 3

您可以这样做：

//Find the end of the first value, 
//this value will always have 2 digits after the decimal point.
int index = val.indexOf(".") + 3;
String tooken1 = val.substring(0, index);

//Remove the first value from the original String
val = val.substring(index);

//get all values after the last non-numerical character.
String tooken3 = val.replaceAll(".+\\D", "");

//remove the previously extracted value from the remainder of the original String.
String tooken2 = val.replace(tooken3, "");

如果String值最后包含一个数字，并且可能在某些其他情况下，则会失败。

Answer 4

如您所知，这意味着您正在处理某种正则表达式。使用它们来满足您的需求。

Java为这些任务提供了不错的正则表达式API

正则表达式可以有捕获组，每个组都可以根据需要使用单个“未格式化”的部分。一切都取决于你将使用/创建的正则表达式。

Answer 5

最简单的方法是使用带有myString.replaceAll（）的正则表达式解析字符串。 myString.split（＆＃34;，＆＃34;）也可能有助于将字符串拆分为字符串数组

Unformat格式化字符串

5 个答案:

更新