Question

在我的网络扩散研究中，我有以下代码为顶点建模轻量级框架。最初的原型来自python中的一个框架，我将其翻译成Java。我遇到的问题是，虽然这个代码比python版本运行速度快10000个顶点，但对于更多的顶点（100,000+），它会停止运行。事实上，python版本在1.2分钟内执行，而java版本甚至在执行7分钟后也没有返回。我不确定为什么相同的代码在大量顶点处分解，我需要帮助修复代码。

import java.util.*;

public class Vertex
{
  private int id;
  private HashMap<Integer, Double> connectedTo;
  private int status;

  public Vertex(int key)
  {
    this.id = key;
    this.connectedTo = new HashMap<Integer, Double>();
    this.status = 0;
  }

  public void addNeighbour(int nbr, double weight)
  {
    this.connectedTo.put(nbr, weight);
  }

  public int getId()
  {
    return this.id;
  }

  public double getWeight(int nbr)
  {
    return this.connectedTo.get(nbr);
  }

  public int getStatus()
  {
    return this.status;
  }

  public Set<Integer> getConnections()
  {
    return this.connectedTo.keySet();
  }

//testing the class

  public static void main(String[] args)
  {
    int noOfVertices = 100000;

    Vertex[] vertexList = new Vertex[noOfVertices];

    for (int i = 0; i < noOfVertices; i++) {
        vertexList[i] = new Vertex(i);
    }

    for (Vertex v : vertexList) {
        int degree = (int)(500*Math.random()); //random choice of degree 
        int neighbourCount = 0; // count number of neighbours built up

        while (neighbourCount <= degree) {
            int nbr = (int) (noOfVertices * Math.random()); // randomly choose a neighbour
            double weight = Math.random(); // randomly assign a weight for the relationship
            v.addNeighbour(nbr, weight);
            neighbourCount++;
        }
    }

  }
}

作为参考，此代码的python版本如下：

import random

class Vertex:
    def __init__(self, key):
      self.id = key
      self.connectedTo = {}

    def addNeighbor(self, nbr, weight=0):
      self.connectedTo[nbr] = weight

    def __str__(self):
      return str(self.id) + ' connectedTo: ' \
          + str([x.id for x in self.connectedTo])

    def getConnections(self):
      return self.connectedTo.keys()

    def getId(self):
      return self.id

    def getWeight(self, nbr):
      return self.connectedTo[nbr]

if __name__ == '__main__':
  numberOfVertices = 100000
  vertexList = [Vertex(i) for i in range(numberOfVertices)] # list of vertices

  for vertex in vertexList:
    degree = 500*random.random() 
    # build up neighbors one by one
    neighbourCount = 0 

    while neighbourCount <= degree:
        neighbour = random.choice(range(numberOfVertices))
        weight = random.random() # random choice of weight
        vertex.addNeighbor(neighbour, weight)
        neighbourCount = neighbourCount + 1

Answer 1

这是一个非常有趣的问题，我相信我也学到了一些新东西。我尝试以不同的方式优化代码，例如使用并行流以及使用ThreadLocalRandom，其速度比Random快三倍。但是，我终于发现了主要的瓶颈：为JVM分配了内存。

因为你的Map中添加了很多元素（最坏的情况是500,000个顶点有100,000个），所以你需要大量的内存（堆空间）。如果允许JVM动态分配内存，则程序将花费很长时间来执行。我解决这个问题的方法是通过将-Xms3G作为VM参数应用到程序的运行配置中来预先为JVM（特别是3 GB）分配内存，这可以在IDE中或通过终端

我还优化了您的代码，我将在下面发布（它只需几秒即可完成）：

import java.util.*;
import java.util.concurrent.*;
import java.util.stream.*;

public class Test {

    private static final ThreadLocalRandom RANDOM = ThreadLocalRandom.current();

    public static void main(String[] args) {
        int noOfVertices = 100_000;

        Vertex[] vertexList = new Vertex[noOfVertices];

        IntStream.range(0, noOfVertices).parallel().forEachOrdered(i -> {
            vertexList[i] = new Vertex(i);

            int degree = (int) (500 * RANDOM.nextDouble()); // random choice of degree

            for (int j = 0; j <= degree; j++) {
                int nbr = (int) (noOfVertices * RANDOM.nextDouble()); // randomly choose a neighbor

                vertexList[i].addNeighbour(nbr, RANDOM.nextDouble());
            }
        });
    }

}

class Vertex {

    private int id;

    private Map<Integer, Double> connectedTo;

    private int status;

    public Vertex(int id) {
        this.id = id;

        this.connectedTo = new HashMap<>(500);
    }

    public void addNeighbour(int nbr, double weight) {
        this.connectedTo.put(nbr, weight);
    }

    public int getId() {
        return this.id;
    }

    public double getWeight(int nbr) {
        return this.connectedTo.get(nbr);
    }

    public int getStatus() {
        return this.status;
    }

    public Set<Integer> getConnections() {
        return this.connectedTo.keySet();
    }

}

我不确定在多线程环境中使用ThreadLocalRandom的明确后果，但如果您愿意，可以将其切换回Math#random。

Java代码执行时间问题

1 个答案: