在内存中存储大型地图

时间:2016-08-12 14:49:16

标签: java performance graph guava

首先是问题的背景:我有一个非常大的图表,需要花费大约4GB才能存储。关于3M节点和34M边缘。我的程序采用这个大图,并从中递归构建较小的图。在递归的每个级别,我有两个图形 - 原始图形和从原始图形创建的图形。递归继续,直到图形缩小到非常小的图表,大约10个节点。

由于我需要这些图表来执行整个程序,因此内存效率对我的应用程序至关重要。

现在这是我目前遇到的问题: 这是用于从较大的图形创建较小图形的算法:

public static Graph buildByTriples(Graph g, ArrayList<Integer> seeds) {
    ArrayList<Edge> edges = new ArrayList(g.getEdgeCount());
    for (int i = 0; i < g.size(); i++) {
        for (Edge e : g.adj(i)) {
            int v = e.getEndpoint(i);
            if (i < v) {
                edges.add(e);
            }
        }
    }

    Table<Integer, Integer, Double> coarseEgdes = HashBasedTable.create(seeds.size(),seeds.size());
    //compute coarse weights
    edges.stream().forEach((e) -> {
        int v = e.getV();
        int u = e.getU();
        if (g.isC(u) && g.isC(v)) {
            addToTable(coarseEgdes, u, v, e.getWeight());
        }else if(!g.isC(u) && g.isC(v)){ //F-C
            for(Edge cEdge: g.cAdj(u)){//get coarse neighbors of the fine edges
                int nb = cEdge.getEndpoint(u);
                if(nb != v){
                    addToTable(coarseEgdes, v, nb, cEdge.getPij() * e.getWeight());

                }
            }
        }else if(g.isC(u) && !g.isC(v)){//C-F
            for(Edge cEdge: g.cAdj(v)){//get coarse neighbors of the fine edges
                int nb = cEdge.getEndpoint(v);
                if(nb != u){
                    addToTable(coarseEgdes, u, nb, cEdge.getPij() * e.getWeight());
                }
            }
        }else{//F-F
            for(Edge cEdgeU: g.cAdj(u)){//get coarse neighbors of the fine edges
                int uNb = cEdgeU.getEndpoint(u);
                for(Edge cEdgeV: g.cAdj(v)){
                    int vNb = cEdgeV.getEndpoint(v);
                    if(uNb != vNb){
                        addToTable(coarseEgdes, uNb, vNb, cEdgeU.getPij() * e.getWeight() * cEdgeV.getPij());
                    }
                }
            }
        }
    });

    return createGraph(g, coarseEgdes); //use the edges to build new graph. Basically loops through coarseEdges and add edge and weight to the new graph.
}

private static void addToTable(Table<Integer, Integer,Double> tbl, int r, int c, double val){
    int mn = Math.min(r, c);//the smaller of the two nodeIds
    int mx = Math.min(r, c);//the largest of the two nodeId
    if(tbl.contains(mn, mx)){
        tbl.put(mn, mx, tbl.get(mn, mx) + val);
    }else{
        tbl.put(mn, mx,val);
    }
}

现在,当我这样做时,我很快就会耗尽内存。我使用YourKit对应用程序进行了分析,并且内存使用量超出了顶部(在用完之前大于6GB),因此CPU使用率也是如此。 coarseEdges可能变得非常大。是否存在更好的内存中Map实现,可以使用大型数据集进行扩展?或者,如果没有存储coarseEdges,有更好的方法吗?

PS:请注意,我的图表无法在恒定时间内检索边缘(u,v)。它基本上是一个列表列表,这更好地提供了我的应用程序的其他关键部分的性能。

**Also See my graph implementation code below: **
public class Graph{
    private final int SIZE;
    private final EdgeList[] nodes;
    private final float[] volumes;
    private final double[] weightedSum;
    private final double[] weightedCoarseSum;
    private final int[] nodeDegrees;
    private final int[] c_nodeDegrees;
    private int edge_count=0;
    private final boolean[] coarse;
    private final EdgeList[] coarse_neighbors;
    public Graph(int SIZE){
        this.SIZE =SIZE;
        nodes = new EdgeList[SIZE];
        coarse_neighbors = new EdgeList[SIZE];

        volumes = new float[SIZE];
        coarse = new boolean[SIZE];

        //initialize data
        weightedSum = new double[SIZE];
        weightedCoarseSum = new double[SIZE];
        nodeDegrees= new int[SIZE];
        c_nodeDegrees = new int[SIZE];

        for(int i=0;i<SIZE;i++){
            nodes[i]=new EdgeList();
            coarse_neighbors[i] = new EdgeList();
            volumes[i]=1;
        }
    }

    public void addEdge(int u, int v, double w){
        //graph is undirected
        //In order to traverse edges in order such that u < v. We store edge u,v such that u<v
        Edge e=null;
        if(u<v){
            e = new Edge(u,v,w);
        }else if(u>v){
            e = new Edge(v,u,w);
        }else{
            throw new UnsupportedOperationException("Self loops not allowed in graph"); //TODO: Need a graph validation routine
        }

        nodes[u].add(e);
        nodes[v].add(e);

        //update the weighted sum of each edge
        weightedSum[u] += w;
        weightedSum[v] += w;

        //update the degree of each edge
        ++nodeDegrees[u];
        ++nodeDegrees[v];

        ++edge_count;
    }

    public int size(){
        return SIZE;
    }

    public EdgeList adj(int v){
        return nodes[v];
    }

    public EdgeList cAdj(int v){
        return coarse_neighbors[v];
    }

    public void sortAdj(int u, Comparator<Edge> c){
        nodes[u].sort(c);
    }

    public void sortCoarseAdj(int u, Comparator<Edge> c){
        coarse_neighbors[u].sort(c);
    }

    public void setCoarse(int node, boolean c){
        coarse[node] = c;
        if(c){
            //update the neighborHood of node
            for(Edge e: adj(node)){
                int v = e.getEndpoint(node);
                coarse_neighbors[v].add(e);
                weightedCoarseSum[v] += e.getWeight();
                ++c_nodeDegrees[v];
            }
        }
    }

    public int getEdgeCount(){
        return edge_count;
    }

    public boolean isC(int id){
        return coarse[id];
    }

    public double weightedDegree(int node){
        return weightedSum[node];
    }

    public double weightedCoarseDegree(int node){
        return weightedCoarseSum[node];
    }

    public int degree(int u){
        return nodeDegrees[u];
    }

    public int cDegree(int u){
        return c_nodeDegrees[u];
    }

    public Edge getCNeighborAt(int u,int idx){
        return coarse_neighbors[u].getAt(idx);
    }

    public float volume(int u){
        return volumes[u];
    }

    public void setVolume(int node, float v){
        volumes[node] = v;
    }

    @Override
    public String toString() {
        return "Graph[nodes:"+SIZE+",edges:"+edge_count+"]";
    }

}


//Edges are first class objects.
public class Edge {
    private boolean deleted=false;
    private int u;
    private int v;
    private double weight;
    private double pij;
    private double algebraicDist = (1/Constants.EPSILON);

    public Edge(int u, int v, double weight) {
        this.u = u;
        this.v = v;
        this.weight = weight;
    }

    public Edge() {
    }

    public int getU() {
        return u;
    }

    public void setU(int u) {
        this.u = u;
    }

    public int getV() {
        return v;
    }

    public void setV(int v) {
        this.v = v;
    }

    public int getEndpoint(int from){
        if(from == v){
            return u;
        }

        return v;
    }

    public double getPij() {
        return pij;
    }

    public void setPij(double pij) {
        this.pij = pij;
    }

    public double getAlgebraicDist() {
        return algebraicDist;
    }

    public void setAlgebraicDist(double algebraicDist) {
        this.algebraicDist = algebraicDist;
    }

    public boolean isDeleted() {
        return deleted;
    }

    public void setDeleted(boolean deleted) {
        this.deleted = deleted;
    }

    public double getWeight() {
        return weight;
    }

    public void setWeight(double weight) {
        this.weight = weight;
    }

    @Override
    public String toString() {
        return "Edge[u:"+u+", v:"+v+"]";
    }
}


// The Edge iterable
public class EdgeList implements Iterable<Edge>{
    private final ArrayList<Edge> data= new ArrayList();

    public void add(Edge e){
        data.add(e);
    }

    @Override
    public Iterator<Edge> iterator() {
        Iterator<Edge> it = new IteratorImpl();
        return it;
    }

    private class IteratorImpl implements Iterator<Edge> {

        public IteratorImpl() {
        }
        private int currentIndex = 0;
        private final int N = data.size();
        @Override
        public boolean hasNext() {

            //skip deleted
            while(currentIndex < N && data.get(currentIndex).isDeleted()){
                currentIndex++;
            }

            return currentIndex < N;
        }

        @Override
        public Edge next() {
            return data.get(currentIndex++);
        }

        @Override
        public void remove() {
            throw new UnsupportedOperationException();
        }
    }

    public Edge getAt(int idx){
        return data.get(idx);
    }

    public void sort(Comparator<Edge> c){
        data.sort(c);
    }
}

2 个答案:

答案 0 :(得分:4)

盲人在这里很少被刺 - 你需要实施它们才能看出它有多大帮助。

1)您可以考虑将复合键(int,int)与hashmap一起使用,而不是使用guava表。对于边缘权重来说肯定会更有效。如果你需要查询从某个顶点传出的边缘,那么它就不太明显了,但是你需要看到cpu与内存的权衡。

2)如果使用纯散列图,则可以考虑使用其中一种堆外实现。例如,看看https://github.com/OpenHFT/Chronicle-Map,它可能是

3)如果你留在记忆中并想要挤出一些额外的空间,你可以用原始地图做一些肮脏的技巧。使用long-gt; double map,例如http://labs.carrotsearch.com/download/hppc/0.4.1/api/com/carrotsearch/hppc/LongDoubleMap.htmlhttp://trove4j.sourceforge.net/javadocs/gnu/trove/map/hash/TLongDoubleHashMap.html,将2xint顶点对编码为long,看看它有多大帮助。如果你使用64位,整数可以占用16个字节(假设是压缩的oops),Double 24个字节 - 每个条目给出32 + 24 = 56个字节,而8 + 8个带有原始映射

答案 1 :(得分:1)

我建议让Guava的ValueGraph看起来像这样的情况。

有可能使递归图的数据结构更有效;您的数据集有多少个递归步骤,图形的大小如何变化?