方法不适用于大型数据集

时间:2013-12-03 05:13:42

标签: java graph hashmap keyset

我试图找到包含每个漫威角色和他们所在的每本书的数据集中最核心的角色。我在下面编写的代码适用于我们自己创建的一个小测试文件来测试方法更快,但是当我在Marvel文件上运行代码时,代码从一开始就中断了。我在整个代码中放置了print语句,以找到它停止工作的地方,我认为通过迭代这么多字符会有所作为,但它从一开始就无法正常工作。 在第一个while()循环中,我将startVertex添加到组中,我在添加startVertex之后立即写了一个System.out.println(group)语句,当我运行测试时,print语句给出“[]”(我是我非常肯定意味着该组没有从startVertex获得任何东西然后陷入无限循环(但对于一小部分字符/书籍,代码工作得非常好)...有关如何获得的任何建议它适用于较大的文件?

编辑:这是文件的链接。大文件必须是原始格式,因为github无法打开它。它们的格式完全相同,两个文件都可以从tsv文件正确解析为多图。

大文件: https://raw.github.com/EECE-210/2013-L1A1/master/mp5/labeled_edges.tsv?token=5408881__eyJzY29wZSI6IlJhd0Jsb2I6RUVDRS0yMTAvMjAxMy1MMUExL21hc3Rlci9tcDUvbGFiZWxlZF9lZGdlcy50c3YiLCJleHBpcmVzIjoxMzg2NzAyNDczfQ%3D%3D--acf1694845215e7a40aca1d6c456769cd825ebcf

小文件: https://github.com/EECE-210/2013-L1A1/blob/master/mp5/testTSVfile.tsv

   /**
     * First find the largest connected set of characters and then 
     * find the most central character of all characters in this set.
     * 
     * @param none
     * @return the name of the character most central to the graph
     */
    public String findMostCentral() {

            Set<String> vertexSet = new LinkedHashSet<String>();
            vertexSet = vertexMap.keySet();
            Iterator<String> iterator = vertexSet.iterator();

            List<String> group = new ArrayList<String>();
            List<String> largestGroup = new ArrayList<String>();

            List<String> Path = new ArrayList<String>();
            Map<String, Integer> longestPathMap = new HashMap<String, Integer>();

            /*
             * This first while loop sets the starting vertex (ie the character that will be checked
             * with every other character to identify if there is/isn't a path between them.
             * We add the character to a group list to later identify the largest group of 
             * connected characters.
             */
            while(iterator.hasNext()){
                    String startVertex = iterator.next();
                    group.add(startVertex);

                    /*
                     * This second while loop sets the destination/end vertex (ie the character that is the 
                     * destination when compared to the starting character) to see if there is a path between
                     * the two characters. If there is, we add the end vertex to the group with the starting 
                     * vertex.
                     */
                    for(String key : vertexSet){
                            String endVertex = key;

                            if( findShortestPath(startVertex, endVertex) != null )
                                    group.add(endVertex);
                    }

                    /*
                     * If the group of connected characters is larger than the largest group, the largest
                     * group is cleared and replaced with the new largest group.
                     * After the group is copied to largest group, clear group.
                     */
                    if(group.size() > largestGroup.size()){
                            largestGroup.clear();
                            for(int i = 0; i < group.size(); i++){
                                    largestGroup.add(group.get(i));
                            }
                    }
                    group.clear();
            }

            /*
             * Iterate through the largest group to find the longest path each character has 
             * to any other character.
             */
            for(String LG : largestGroup){
                    String startingVertex = LG;
                    int longestPath = 0;

                    for(String LG2 : largestGroup){
                            String endingVertex = LG2;

                            Path = findShortestPath(startingVertex, endingVertex);

                            /*
                             * If the path size from startingVertex to endingVertex is longer than any other
                             * path that startingVertex is connected to, set it as the longest path for that
                             * startingVertex.
                             */
                            if(Path.size() > longestPath){
                                    longestPath = Path.size();
                            }
                    }
                    //save the starting vertex and it's longest path to a map
                    longestPathMap.put(startingVertex, longestPath);
            }

            /*
             * Iterates through the longestPathMap and finds the shortest longest path and assigns
             * the character with the shortest longest path to mostCentralCharacter.
             */
            int shortestLongestPath =  Integer.MAX_VALUE;
            String mostCentralCharacter = new String();

            for(Map.Entry<String, Integer> entry : longestPathMap.entrySet()){

                    if((Integer) entry.getValue() < shortestLongestPath){
                            shortestLongestPath = (Integer) entry.getValue();
                            mostCentralCharacter = (String) entry.getKey();
                    }        
            }

            return mostCentralCharacter;
    }

1 个答案:

答案 0 :(得分:0)

感谢您的快速回复!我在任何for-in循环开始之前打印vertexSet时发现了这个问题。 vertexSet的第一个字符串是“”(即没有),所以它会在startVertex中存储第一个字符串“”,然后获取endVertex,然后陷入无限循环,试图在任何东西和一个字符之间找到快速路径....感谢你你的帮助!