如何对字符串进行排序[即文件名集]基于特定的名称

时间:2019-05-08 10:26:38

标签: java

当前,在运行ListFiles()之后,我获得了从目录中提取的文件名列表,并要求将其作为输入,以下是我获得的xml文件。

我获得文件名列表的代码是:

 String dirPath = "D:\\Input_Split_xml";
       File dir = new File(dirPath);
      String[] files = dir.list();
       for (String aFile : files) 
       {
              System.out.println("file names are "+aFile);
          }

Currently all the File names are stored in "aFile" :

file names are 51090323-005_low_level.xml
file names are 90406990_low_level.xml
file names are 90406991_low_level.xml
file names are TC_CADBOM_51090323-005_low_level_BOM.xml
file names are TC_CADBOM_90406990_low_level_BOM.xml
file names are TC_CADBOM_90406991_low_level_BOM.xml
file names are TC_CADDESIGN_51090323-005_low_level.xml
file names are TC_CADDESIGN_90406990_low_level.xml
file names are TC_CADDESIGN_90406991_low_level.xml

现在,我需要按照以下方式对这些文件名进行排序,以将它们视为解析xml文件的输入。

1)对于Ex:基于“ 51090323-005”编号,我需要将位于该编号之下的所有文件名分组,然后依次将它们作为输入,并使用它来获取每个xml的节点数。即 这些是该数字下的3种XML,因此我将收集所有这些XML并一个接一个地使用它们。

 a)51090323-005_low_level.xml
 b)TC_CADBOM_51090323-005_low_level_BOM.xml
 c)TC_CADDESIGN_51090323-005_low_level.xml

专家需要您的帮助来解决此问题

4 个答案:

答案 0 :(得分:1)

此函数返回一个映射,其中每个条目对应于一组相关文件。 多亏了正则表达式,因此很容易验证文件名模式并提取数字部分(请参阅group(1))

// key=number, value=array of matching files, sorted
public static Map<String, File[]> process(String fileLocation) {
    Map<String, File[]> fileMap = new HashMap<>();
    Pattern startFileNamePattern = Pattern.compile("([0-9-]+)_low_level.xml");
    File dir = new File(fileLocation);
    File[] startFiles = dir.listFiles((File file, String name) -> startFileNamePattern.matcher(name).matches());
    for (File f : startFiles) {
        Matcher m = startFileNamePattern.matcher(f.getName());
        if (m.matches()) {
            String number = m.group(1);
            File[] allFiles = dir.listFiles((File arg0, String name) -> name.contains(number));
            Arrays.sort(allFiles);
            fileMap.put(number, allFiles);
        }
    }
    return fileMap;
}

答案 1 :(得分:0)

String[] files转换为List,并删除不包含数字的条目。

List<String> fileNames = Arrays.asList(files);

public static List<String> groupFiles(String number, List<String> fileNames){
    fileNames.removeIf(n -> (!n.contains(number)));
    return fileNames;
}

输出:

[51090323-005_low_level.xml, TC_CADBOM_51090323-005_low_level_BOM.xml, TC_CADDESIGN_51090323-005_low_level.xml]

此外,如果您需要以编程方式获取数字,则可以使用类似的内容:

public static List<String> getNumbers(List<String> fileNames){
    List<String> numbers = new ArrayList<>();
    fileNames.removeIf(n -> (!Character.isDigit(n.substring(0, 1).charAt(0))));
    fileNames.forEach(name -> {
        numbers.add(name.substring(0, 7));
    });
    return numbers;
}

输出:

[5109032, 9040699, 9040699]

这将从数组中删除不是以数字开头的文件,然后从其余文件中获取8个字符的子字符串。

答案 2 :(得分:0)

添加到Cray的答案中。您可以使用

获取数字
String prefix = aFile.split("_")[0];
if (Character.isDigit(prefix.charAt(0))) {
    // prefix contains a number that we can filter.
}

答案 3 :(得分:0)

  1. 如果您有文件号
for (String aFile : files)
{
    if(aFile.contains("51090323-005")) {
        System.out.println("file names are " + aFile);
    }
}

Output:

file names are 51090323-005_low_level.xml
file names are TC_CADBOM_51090323-005_low_level_BOM.xml
file names are TC_CADDESIGN_51090323-005_low_level.xml
  1. 否则,您可以做类似的事情
// Extract the numbers

// This HashSet will contain all the numbers. HashSet -> To avoid duplicate numbers 
Set<String> baseFiles = new HashSet<>();

System.out.println("Files numbers:");
// Iterate all files to extract the numbers
// Assumption: The base file have the number at beginning, so we will use a pattern that try to match numbers at the beginning of the name
for (String aFile : files)
{
    // Create a pattern that match the strings that have at the beginning numbers and/or -
    // "matcher" will split the string in groups based on the given pattern
    Matcher matcher = Pattern.compile("^([0-9-]+)(.*)").matcher(aFile);
    // Verify if the string has the wanted pattern
    if(matcher.matches()) {
        // Group 0 is the original string
        // Group 1 is the number
        // Group 2 the rest of the filename
        String number = matcher.group(1);
        System.out.println(number);
        // Add the number to the HashSet
        baseFiles.add(number);
    }
}

// Iterate all the numbers to create the groups
for (String baseFile : baseFiles)
{
    System.out.println("Group " + baseFile);
    // Search the filenames that contain the given number
    for (String aFile : files)
    {
        // Verify if the current filename has the given number
        if(aFile.contains(baseFile)) {
            System.out.println("file names are " + aFile);
        }
    }
}

Output:

Files numbers:
51090323-005
90406990
90406991
Group 90406991
file names are 90406991_low_level.xml
file names are TC_CADBOM_90406991_low_level_BOM.xml
file names are TC_CADDESIGN_90406991_low_level.xml
Group 51090323-005
file names are 51090323-005_low_level.xml
file names are TC_CADBOM_51090323-005_low_level_BOM.xml
file names are TC_CADDESIGN_51090323-005_low_level.xml
Group 90406990
file names are 90406990_low_level.xml
file names are TC_CADBOM_90406990_low_level_BOM.xml
file names are TC_CADDESIGN_90406990_low_level.xml