如何在java中的页面序列练习中获得更好的时间复杂度Big O.

时间:2014-12-23 17:12:03

标签: java algorithm big-o

问题是:

条目按时间顺序写入每行一个条目的文件。每个条目的格式为:

[时间戳] [空间] [用户ID] [空间] [页式-ID] \ n上

您的任务是从一组日志中确定所有用户中最常见的10个三页序列。

例如,这是一个示例日志:

1248977297 BBBB Search
1248977302 AAAA Browse
1248977308 BBBB Search
1248977310 AAAA Browse
1248977317 BBBB Search
1248977325 AAAA Search
1248977332 AAAA Search
1248977332 BBBB Search
1248977339 BBBB Checkout
1248977348 AAAA Search
1248977352 BBBB Browse
1248977367 AAAA Search



The first three-page sequence made by user AAAA is “Browse->Browse->Search”
The second three-page-sequence made by user AAAA is “Browse->Search->Search” 
The third three-page-sequence made by user AAAA is “Search->Search->Search”
The fourth three-page-sequence made by user AAAA is “Search->Search->Search”

给定示例数据的程序输出应为:

Search -> Search -> Search = 4
Browse -> Browse -> Search = 1
Search -> Search -> Checkout = 1
Browse -> Search -> Search = 1
Search -> Checkout -> Browse = 1

输出必须包含前10个三页序列(按顺序)和每个序列的出现次数。

我想到的最好的算法是O(n ^ 2),但我找到的答案说它可以在O(N + N * lg(N))中完成,我怎样才能归档这种复杂性?,表示在O(N)中列出并按O(N lg(N))排序

/* Solution
 * Runtime complexity: O(n^2).
 * Spatial complexity: O(n).
 */
import java.io.*;
import java.util.*;

public class Solution {

    public static void main(String args[]) throws IOException {
        /*
         * Reads the input from a txt file.
         */
        String file = "C:\\Users\\Public\\Documents\\txt\\files";
        BufferedReader f = new BufferedReader(new FileReader(file + ".txt"));
        String line = "";

        /*
         * @map data structure to store all the users with their page ids.
         */
        Map<Integer, List<String>> map = new HashMap<Integer, List<String>>();

        /*
         *Read the txt or log file and store in the @map the user<Integer> and in a list<String> all the page sequences that he visited.
         */
        while ((line = f.readLine()) != null && line.trim().length() != 0) {
            StringTokenizer tokens = new StringTokenizer(line);
            while (tokens.hasMoreElements()) {
                String timeStamp = tokens.nextToken();
                int userId = Integer.parseInt(tokens.nextToken());
                String pageType = tokens.nextToken();

                List<String> values = map.get(userId);
                if (values == null) {
                    values = new ArrayList<String>();
                    map.put(userId, values);
                }
                values.add(pageType);
            }
        }
        /*
         * Create the sequences by user.
         */
        List<String> listSequences = generateSequencesByUser(map);

        /*
         * Count the frequency of each sequence.
         */
        Map<String, Integer> mapFrequency = countFrequencySequences(listSequences);

        /*
         * Sort the map by values.
         */
        Map<String, Integer> sortedMap = Solution.sortByValue(mapFrequency);

        /*
         * Print the Top 10 of sequences.
         */
        printTop10(sortedMap);
    }
    /*
     * Method to create sequences by user.
     */
    public static List<String> generateSequencesByUser(Map<Integer, List<String>> map) {
        List<String> list = new ArrayList<String>();
        for (Map.Entry<Integer, List<String>> entry : map.entrySet()) {
            int key = entry.getKey();
            for (int i = 2; i < entry.getValue().size(); i++) {
                String seq = entry.getValue().get(i - 2) + "->" + entry.getValue().get(i - 1) + "->" + entry.getValue().get(i);
                list.add(seq);
            }
        }
        return list;
    }

    /*
     * Method the frequency of each sequence and stored in a map.
     */
    public static Map<String, Integer> countFrequencySequences(List<String> listSequences) {
        Map<String, Integer> mapFrequency = new HashMap<String, Integer>();

        for (String temp : listSequences) {
            Integer counter = mapFrequency.get(temp);
            if (counter == null) {
                counter = 1;
                mapFrequency.put(temp, counter);
            } else {
                mapFrequency.put(temp, counter + 1);
            }
        }
        return mapFrequency;
    }

    /*
     * Method to print the top 10 of sequences.
     */
    public static void printTop10(Map<String, Integer> map) {
        int count = 0;
        for (Map.Entry<String, Integer> entry : map.entrySet()) {
            count++;
            if (count > 10) {
                break;
            } else {
                System.out.println(entry.getKey() + " = " + entry.getValue());
            }
        }
    }

    /*
     * Order the map by values.
     */
    public static Map<String, Integer> sortByValue(Map<String, Integer> map) {
        List list = new LinkedList(map.entrySet());
        Collections.sort(list, new Comparator() {
            public int compare(Object o1, Object o2) {
                return ((Comparable) ((Map.Entry) (o2)).getValue()).compareTo(((Map.Entry) (o1)).getValue());
            }
        });

        Map result = new LinkedHashMap();
        for (Iterator it = list.iterator(); it.hasNext();) {
            Map.Entry entry = (Map.Entry) it.next();
            result.put(entry.getKey(), entry.getValue());
        }
        return result;
    }


}

1 个答案:

答案 0 :(得分:2)

通过将问题分成三个更简单的任务,您可以在O(N LogN)或更好的任务中完成任务:

  1. 按时间戳排序列表,
  2. 对每个三页序列进行计数,
  3. 按计数挑选前十项。
  4. 第一项任务是标准排序。我们假设它现在是 * 的O(N LogN)。

    使用一对哈希映射很容易完成第二项任务:

    • 对于每个用户,在第一个哈希映射中保留其最后三个页面的三元素列表。每次发现用户的新操作时,将列表中的页面移动一个。
    • 如果上面步骤中的列表有三个条目,请为它们创建一个由三部分组成的键,并在第二个哈希映射中增加其计数。

    上面的每一步都是每个日志条目的O(1)操作,因此该任务的时间是O(N)

    按计数选择前十个条目的第三个任务可以通过检索密钥计数对并按计数对它们进行排序来完成。在最糟糕的情况下,当所有页面转换都是唯一的时,最终会有3N个条目进行排序,因此该任务再次为O(N LogN) *

    一旦你知道算法,实现应该是直截了当的,因为Java提供了所有构建块来实现算法的每个任务。

    * 您可以通过两次观察将时间缩短为O(N):

    • 第一项任务使用十位数字表示时间戳,因此您可以使用非比较线性时间算法(例如Radix sort)来实现线性时序,
    • 可以通过线性时间election algorithm来获得前十项。

    但是,这种实现需要更多的工作,因为Java不会为它提供现成的“构建块”。