Java - 从文本文件中的平面结构读取分层数据并构建hashmap

时间:2015-11-20 04:36:34

标签: java performance collections hashmap

我有一个文本文件,其中文本文件中的平面结构中提供了分层数据。

child parent
Y,     X
Z,     Y
A,     Z

它就像X是Y的父级,它本身的Z和Z的父级是A的父级。它可以以任何顺序出现在文件中。我需要构建一个hashmap,其中key应该是element,value应该是所有祖先元素的列表。 例如,hashmap应该具有基于上述数据的条目,例如

A = [Z,Y,X],Y = [X],Z = [Y,X]。

我在java中编写了一个代码来构建这个hashmap。只需要知道是否有更有效的方法来做到这一点。 逻辑是

  1. 读取hash是key的整个文件,parent是值
  2. 从上面创建的hashmap中递归遍历每个子节点并构建父节点列表。

    public class Test {
    public static final String FILE_NAME = "dataset1";
    public static final HashMap<String,String> inputMap = new HashMap<String,String>();
    public static final Map<String, ArrayList<String>> parentChildMap = new HashMap<String,ArrayList<String>>();
    
    private static void readTextFile(String aFileName) throws IOException {
    
      Path path = Paths.get(aFileName);
    
      try (BufferedReader reader = Files.newBufferedReader(path, StandardCharsets.UTF_8)){
          String line = null;
          while ((line = reader.readLine()) != null) {
              String[] dataArray  = line.split(",");
              String child = dataArray[0];
              String parent = dataArray[1];
    
              inputMap.put(child, parent);
            }      
        }
      }
    public static ArrayList<String> getParents(String childId, ArrayList<String> parents) {
    
       if (childId == null)
        return parents;
    
      String parentId = inputMap.get(childId);
      if(parentId!=null) parents.add(parentId);
       getParents(parentId, parents);
    
       return parents;
    }
    
    public static void main(String[] s) throws IOException {
      readTextFile(FILE_NAME);
      for(String child : inputMap.keySet()) {
        ArrayList<String> parents = getParents(child, new ArrayList<String>());
        parentChildMap.put(child, parents);
    }
     }
    

2 个答案:

答案 0 :(得分:3)

递归已经很有效了。以下是您可以优化的内容:

  • 将递归放入循环
  • 在递归/循环中使用memoization(避免重新计算)
  • 每次调用getParent时都不要重新计算祖先,预先计算结果并存储它们

这是我的代码:

import java.io.BufferedReader;
import java.io.IOException;
import java.nio.charset.StandardCharsets;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.Map;

public class Test {
    public static final String FILE_NAME = "dataset1";
    public static final HashMap<String, String> inputMap = new HashMap<String, String>();
    public static final Map<String, ArrayList<String>> parentChildMap = new HashMap<String, ArrayList<String>>();

    private static void readTextFile(String aFileName) throws IOException {

        Path path = Paths.get(aFileName);

        try (BufferedReader reader = Files.newBufferedReader(path, StandardCharsets.UTF_8)) {
            String line = null;
            while ((line = reader.readLine()) != null) {
                String[] dataArray = line.split(",");
                String child = dataArray[0];
                String parent = dataArray[1];

                inputMap.put(child, parent);
            }
        }

        // this replaces the recursion:
        for (String k : inputMap.keySet()) {
            String ok = k;
            ArrayList<String> tmp = new ArrayList<String>();
            while (true) {
                // if this has already been computed, use old answer
                if (parentChildMap.containsKey(k)) {
                    tmp.addAll(parentChildMap.get(k));
                    break;
                }
                if (inputMap.containsKey(k)) {
                    String v = inputMap.get(k);
                    tmp.add(v);
                    k = v;
                } else {
                    break;
                }
            }
            parentChildMap.put(ok, tmp);
        }
    }

    public static ArrayList<String> getParents(String childId) {
        // do not recompute
        return parentChildMap.get(childId);
    }
}

答案 1 :(得分:1)

你要求“更有效率的方式”,所以这是我的批评(次要)和我的建议。

  • 请勿将line初始化为null。请宣布它。
  • 请勿使用split()。它可能会拆分为两个以上的值,并且必须创建一个数组。只需使用indexOf()

所以,第一种方法变得(压缩一些):

public static final Map<String, String> inputMap = new HashMap<>();
private static void readTextFile(String aFileName) throws IOException {
    try (BufferedReader reader = Files.newBufferedReader(Paths.get(aFileName),
                                                         StandardCharsets.UTF_8)){
        for (String line; (line = reader.readLine()) != null; ) {
            int idx = line.indexOf(',');
            inputMap.put(/*child*/line.substring(0, idx),
                         /*parent*/line.substring(idx + 1));
        }      
    }
}

现在提出建议。

您的代码会多次解析同一父母,例如在检索A的父级时,它必须遍历整个父链ZYX,并且在检索Z的父级时,必须走父链YX。你多次做同样的步行。

只做一次会更有效率。由于数据是无序的,您必须使用递归来完成。我已将parentChildMap重命名为更合适的ancestorMap

public static final Map<String, List<String>> ancestorMap = new HashMap<>();
private static List<String> getAncestors(String child) {
    // Check if ancestors already resolved
    List<String> ancestors = ancestorMap.get(child);
    if (ancestors == null) {
        // Find parent
        String parent = inputMap.get(child);
        if (parent == null) {
            // Child has no parent, i.e. no ancestors
            ancestors = Collections.emptyList();
        } else {
            // Find ancestors of parent using recursive call
            List<String> parentAncestors = getAncestors(parent);
            if (parentAncestors.isEmpty()) {
                // Parent has no ancestors, i.e. child has single ancestor (the parent)
                ancestors = Collections.singletonList(parent);
            } else {
                // Child's ancestors is parent + parentAncestors
                ancestors = new ArrayList<>(parentAncestors.size() + 1);
                ancestors.add(parent);
                ancestors.addAll(parentAncestors);
            }
        }
        // Save resolved ancestors
        ancestorMap.put(child, ancestors);
    }
    return ancestors;
}

如果您不关心使用emptyList()singletonList()的优化,或者有评论,可以将其压缩为:

private static List<String> getAncestors(String child) {
    List<String> ancestors = ancestorMap.get(child);
    if (ancestors == null) {
        ancestorMap.put(child, ancestors = new ArrayList<>());
        String parent = inputMap.get(child);
        if (parent != null) {
            ancestors.add(parent);
            ancestors.addAll(getAncestors(parent));
        }
    }
    return ancestors;
}

然后main方法变为:

public static final String FILE_NAME = "dataset1";
public static void main(String[] args) throws IOException {
    readTextFile(FILE_NAME);
    for (String child : inputMap.keySet())
        getAncestors(child); // Ignore return value
}