Question

我开发了一种算法来解决使用哈希表的2和问题，尽管它的性能对于大量输入来说是可怕的。

我的目标是找到所有不同的数字x，y，其中-10000 <= x + y <= 10000。顺便说一下，我的代码O（n * m）的性能是n，输入的大小是n，地图上的键数是多少？

这是我的代码：

import com.google.common.base.Stopwatch;

import java.util.Scanner;
import java.util.HashMap;
import java.util.ArrayList;

import static com.google.common.collect.Lists.newArrayList;

public class TwoSum {

    private HashMap<Long, Long> map;
    private ArrayList<Long> Ts;
    private long result = 0L;


    public TwoSum() {
        Ts = newArrayList();
        for(long i = -10000; i < 10001; i++){
            Ts.add(i);
        }

        Scanner scan = new Scanner(System.in);
        map = new HashMap<>();
        while (scan.hasNextLong()) {
            long a = scan.nextLong();
            if (!map.containsKey(a)) {
                map.put(a, a);
            }
        }
    }

    private long count(){
        //long c = 0L;
        for (Long T : Ts) {
            long t = T;
            for (Long x : map.values()) {
                long y = t - x;
                if (map.containsValue(y) && y != x) {
                    result++;
                }
                //System.out.println(c++);
            }
        }
        return result / 2;
    }

    public static void main(String [] args) {
        TwoSum s = new TwoSum();
        Stopwatch stopwatch = Stopwatch.createStarted();
        System.out.println(s.count());
        stopwatch.stop();
        System.out.println("time:" + stopwatch);

    }
}

示例输入：

-7590801 -3823598 -5316263 -2616332 -7575597 -621530 -7469475 1084712 -7780489 -5425286 3971489 -57444 1371995 -5401074 2383653 1752912 7455615 3060706 613097 -1073084 7759843 7267574 -7483155 -2935176 -5128057 -7881398 -637647 -2607636 -3214997 -8253218 2980789 168608 3759759 -5639246 555129 -4489068 44019 2275782 -3506307 -8031288 -213609 -4524262 -1502015 -1040324 3258235 32686 1047621 -3376656 7601567 -7051390 6633993 -6245148 4994051 -4259178 856589 6047000 1785511 4449514 -1177519 4972172 8274315 7725694 -4923179 5076288 -876369 -7663790 1613721 4472116 -4587501 3194726 6195357 -3364248 -113737 6260410 1974241 3174620 3510171 7289166 4532581 -6650736 -3782721 7007010 6007081 -7661180 -1372125 -5967818 516909 -7625800 -2700089 -7676790 -2991247 2283308 1614251 -4619234 2741749 567264 4190927 5307122 -5810503 -6665772

输出：6

Answer 1

算法的要点可以用伪代码重写为：

for all integers t from -10k to 10k,
    for all map keys x,
        if t - x in map, and t is not 2*x,
            count ++
return count / 2

您可以轻松改善这一点：

for all integers t from -10k to 10k,
    for the lower half of keys x in ascending order such that t is not 2*x
        if t - x in map,
            count ++

这使它的速度提高了两倍（你不再重复计算）。但是，您需要对输入进行排序以确保按升序排列地图键。您可以将它们添加到TreeSet中，然后将其移动到LinkedHashSet中。如果您不关心值，则使用集合优于地图，并且所有信息都在密钥中。

运行时间仍为O（输入*范围），因为您有两个嵌套循环，一个具有range次迭代，另一个具有input的一半。这是算法的一个基本缺点，没有多少优化可以解决它。

Answer 2

问题是来自Algorithms: Design and Analysis的作业 -斯坦福大学提供的在线课程，由Tim Roughgarden教授教授。我碰巧正在上同样的课程。

在哈希表中查找t - i的通常解决方案是对单个O(n)使用t，但是这样做20001 * 1000000会导致大约200亿次查找！ / p>

一个更好的解决方案是从输入文件中创建一个排序集xs，然后∀i ∈ xs从xs中找到[-10000 - i, 10000 - i]范围内的所有数字。根据定义，由于排序集没有重复项，因此我们不必担心范围内等于i的任何数字。不过有一个陷阱，在问题陈述中确实不清楚。找到唯一的(x, y) ∀ x, y ∈ xs不仅足够，而且它们的和也是唯一的。显然，两个唯一的数字可能会产生相等的总和（例如2 + 4 = 1 + 5 = 6）。因此，我们也需要跟踪总和。

最后，一旦超过5000，我们就可以停止，因为右边的数字加起来不能少于10000。

这是一个Scala解决方案：

def twoSumCount(xs: SortedSet[Long]): Int = {
  xs
    .foldLeft(collection.mutable.Set.empty[Long]) { (sums, i) =>
      if (i < TenThou / 2) {
        xs
          // using from makes it slower
          .range(-TenThou - i, TenThou - i + 1)
          .map(_ + i)
          // using diff makes it slower
          .withFilter(y => !sums.contains(y))
          // adding individual elements is faster than using
          // diff/filter/filterNot and adding all using ++=
          .foreach(sums.add)
      }
      sums
    }
    .size
}

基准：

cores: 8
hostname: ***
name: OpenJDK 64-Bit Server VM
osArch: x86_64
osName: Mac OS X
vendor: Azul Systems, Inc.
version: 11.0.1+13-LTS
Parameters(file -> 2sum): 116.069441 ms

如何使用哈希表改进一系列数字的2和算法？

2 个答案: