首先是问题的背景:我有一个非常大的图表,需要花费大约4GB才能存储。关于3M节点和34M边缘。我的程序采用这个大图,并从中递归构建较小的图。在递归的每个级别,我有两个图形 - 原始图形和从原始图形创建的图形。递归继续,直到图形缩小到非常小的图表,大约10个节点。
由于我需要这些图表来执行整个程序,因此内存效率对我的应用程序至关重要。
现在这是我目前遇到的问题: 这是用于从较大的图形创建较小图形的算法:
public static Graph buildByTriples(Graph g, ArrayList<Integer> seeds) {
ArrayList<Edge> edges = new ArrayList(g.getEdgeCount());
for (int i = 0; i < g.size(); i++) {
for (Edge e : g.adj(i)) {
int v = e.getEndpoint(i);
if (i < v) {
edges.add(e);
}
}
}
Table<Integer, Integer, Double> coarseEgdes = HashBasedTable.create(seeds.size(),seeds.size());
//compute coarse weights
edges.stream().forEach((e) -> {
int v = e.getV();
int u = e.getU();
if (g.isC(u) && g.isC(v)) {
addToTable(coarseEgdes, u, v, e.getWeight());
}else if(!g.isC(u) && g.isC(v)){ //F-C
for(Edge cEdge: g.cAdj(u)){//get coarse neighbors of the fine edges
int nb = cEdge.getEndpoint(u);
if(nb != v){
addToTable(coarseEgdes, v, nb, cEdge.getPij() * e.getWeight());
}
}
}else if(g.isC(u) && !g.isC(v)){//C-F
for(Edge cEdge: g.cAdj(v)){//get coarse neighbors of the fine edges
int nb = cEdge.getEndpoint(v);
if(nb != u){
addToTable(coarseEgdes, u, nb, cEdge.getPij() * e.getWeight());
}
}
}else{//F-F
for(Edge cEdgeU: g.cAdj(u)){//get coarse neighbors of the fine edges
int uNb = cEdgeU.getEndpoint(u);
for(Edge cEdgeV: g.cAdj(v)){
int vNb = cEdgeV.getEndpoint(v);
if(uNb != vNb){
addToTable(coarseEgdes, uNb, vNb, cEdgeU.getPij() * e.getWeight() * cEdgeV.getPij());
}
}
}
}
});
return createGraph(g, coarseEgdes); //use the edges to build new graph. Basically loops through coarseEdges and add edge and weight to the new graph.
}
private static void addToTable(Table<Integer, Integer,Double> tbl, int r, int c, double val){
int mn = Math.min(r, c);//the smaller of the two nodeIds
int mx = Math.min(r, c);//the largest of the two nodeId
if(tbl.contains(mn, mx)){
tbl.put(mn, mx, tbl.get(mn, mx) + val);
}else{
tbl.put(mn, mx,val);
}
}
现在,当我这样做时,我很快就会耗尽内存。我使用YourKit对应用程序进行了分析,并且内存使用量超出了顶部(在用完之前大于6GB),因此CPU使用率也是如此。 coarseEdges
可能变得非常大。是否存在更好的内存中Map实现,可以使用大型数据集进行扩展?或者,如果没有存储coarseEdges
,有更好的方法吗?
PS:请注意,我的图表无法在恒定时间内检索边缘(u,v)。它基本上是一个列表列表,这更好地提供了我的应用程序的其他关键部分的性能。
**Also See my graph implementation code below: **
public class Graph{
private final int SIZE;
private final EdgeList[] nodes;
private final float[] volumes;
private final double[] weightedSum;
private final double[] weightedCoarseSum;
private final int[] nodeDegrees;
private final int[] c_nodeDegrees;
private int edge_count=0;
private final boolean[] coarse;
private final EdgeList[] coarse_neighbors;
public Graph(int SIZE){
this.SIZE =SIZE;
nodes = new EdgeList[SIZE];
coarse_neighbors = new EdgeList[SIZE];
volumes = new float[SIZE];
coarse = new boolean[SIZE];
//initialize data
weightedSum = new double[SIZE];
weightedCoarseSum = new double[SIZE];
nodeDegrees= new int[SIZE];
c_nodeDegrees = new int[SIZE];
for(int i=0;i<SIZE;i++){
nodes[i]=new EdgeList();
coarse_neighbors[i] = new EdgeList();
volumes[i]=1;
}
}
public void addEdge(int u, int v, double w){
//graph is undirected
//In order to traverse edges in order such that u < v. We store edge u,v such that u<v
Edge e=null;
if(u<v){
e = new Edge(u,v,w);
}else if(u>v){
e = new Edge(v,u,w);
}else{
throw new UnsupportedOperationException("Self loops not allowed in graph"); //TODO: Need a graph validation routine
}
nodes[u].add(e);
nodes[v].add(e);
//update the weighted sum of each edge
weightedSum[u] += w;
weightedSum[v] += w;
//update the degree of each edge
++nodeDegrees[u];
++nodeDegrees[v];
++edge_count;
}
public int size(){
return SIZE;
}
public EdgeList adj(int v){
return nodes[v];
}
public EdgeList cAdj(int v){
return coarse_neighbors[v];
}
public void sortAdj(int u, Comparator<Edge> c){
nodes[u].sort(c);
}
public void sortCoarseAdj(int u, Comparator<Edge> c){
coarse_neighbors[u].sort(c);
}
public void setCoarse(int node, boolean c){
coarse[node] = c;
if(c){
//update the neighborHood of node
for(Edge e: adj(node)){
int v = e.getEndpoint(node);
coarse_neighbors[v].add(e);
weightedCoarseSum[v] += e.getWeight();
++c_nodeDegrees[v];
}
}
}
public int getEdgeCount(){
return edge_count;
}
public boolean isC(int id){
return coarse[id];
}
public double weightedDegree(int node){
return weightedSum[node];
}
public double weightedCoarseDegree(int node){
return weightedCoarseSum[node];
}
public int degree(int u){
return nodeDegrees[u];
}
public int cDegree(int u){
return c_nodeDegrees[u];
}
public Edge getCNeighborAt(int u,int idx){
return coarse_neighbors[u].getAt(idx);
}
public float volume(int u){
return volumes[u];
}
public void setVolume(int node, float v){
volumes[node] = v;
}
@Override
public String toString() {
return "Graph[nodes:"+SIZE+",edges:"+edge_count+"]";
}
}
//Edges are first class objects.
public class Edge {
private boolean deleted=false;
private int u;
private int v;
private double weight;
private double pij;
private double algebraicDist = (1/Constants.EPSILON);
public Edge(int u, int v, double weight) {
this.u = u;
this.v = v;
this.weight = weight;
}
public Edge() {
}
public int getU() {
return u;
}
public void setU(int u) {
this.u = u;
}
public int getV() {
return v;
}
public void setV(int v) {
this.v = v;
}
public int getEndpoint(int from){
if(from == v){
return u;
}
return v;
}
public double getPij() {
return pij;
}
public void setPij(double pij) {
this.pij = pij;
}
public double getAlgebraicDist() {
return algebraicDist;
}
public void setAlgebraicDist(double algebraicDist) {
this.algebraicDist = algebraicDist;
}
public boolean isDeleted() {
return deleted;
}
public void setDeleted(boolean deleted) {
this.deleted = deleted;
}
public double getWeight() {
return weight;
}
public void setWeight(double weight) {
this.weight = weight;
}
@Override
public String toString() {
return "Edge[u:"+u+", v:"+v+"]";
}
}
// The Edge iterable
public class EdgeList implements Iterable<Edge>{
private final ArrayList<Edge> data= new ArrayList();
public void add(Edge e){
data.add(e);
}
@Override
public Iterator<Edge> iterator() {
Iterator<Edge> it = new IteratorImpl();
return it;
}
private class IteratorImpl implements Iterator<Edge> {
public IteratorImpl() {
}
private int currentIndex = 0;
private final int N = data.size();
@Override
public boolean hasNext() {
//skip deleted
while(currentIndex < N && data.get(currentIndex).isDeleted()){
currentIndex++;
}
return currentIndex < N;
}
@Override
public Edge next() {
return data.get(currentIndex++);
}
@Override
public void remove() {
throw new UnsupportedOperationException();
}
}
public Edge getAt(int idx){
return data.get(idx);
}
public void sort(Comparator<Edge> c){
data.sort(c);
}
}
答案 0 :(得分:4)
盲人在这里很少被刺 - 你需要实施它们才能看出它有多大帮助。
1)您可以考虑将复合键(int,int)与hashmap一起使用,而不是使用guava表。对于边缘权重来说肯定会更有效。如果你需要查询从某个顶点传出的边缘,那么它就不太明显了,但是你需要看到cpu与内存的权衡。
2)如果使用纯散列图,则可以考虑使用其中一种堆外实现。例如,看看https://github.com/OpenHFT/Chronicle-Map,它可能是
3)如果你留在记忆中并想要挤出一些额外的空间,你可以用原始地图做一些肮脏的技巧。使用long-gt; double map,例如http://labs.carrotsearch.com/download/hppc/0.4.1/api/com/carrotsearch/hppc/LongDoubleMap.html或http://trove4j.sourceforge.net/javadocs/gnu/trove/map/hash/TLongDoubleHashMap.html,将2xint顶点对编码为long,看看它有多大帮助。如果你使用64位,整数可以占用16个字节(假设是压缩的oops),Double 24个字节 - 每个条目给出32 + 24 = 56个字节,而8 + 8个带有原始映射
答案 1 :(得分:1)
我建议让Guava的ValueGraph看起来像这样的情况。
有可能使递归图的数据结构更有效;您的数据集有多少个递归步骤,图形的大小如何变化?