我想实现一个TreeWritable类来表示Tree结构。 我已经尝试了以下实现,但我得到了mapred.MapTask:记录太大而不能用于内存缓冲区错误。 我应该如何为多级数据结构实现Writable?
public class TreeWritable implements Writable
{
private final Set<TreeWritable> children = new LinkedHashSet<TreeWritable>();
private String data;
private int level;
public TreeWritable( String data, int level )
{
this.data = data;
this.level = level;
}
public int getLevel()
{
return level;
}
public TreeWritable()
{
}
public TreeWritable child( String data )
{
for ( TreeWritable child : children )
{
if ( child.data.equals( data ) )
{
return child;
}
}
return child( new TreeWritable( data, this.level + 1 ) );
}
TreeWritable child( TreeWritable child )
{
children.add( child );
return child;
}
public Set<TreeWritable> getChildren()
{
return children;
}
public String getId()
{
return data;
}
public void write( DataOutput out ) throws IOException
{
out.writeUTF( data );
out.write( level );
int size = children.size();
out.writeInt( size );
while(children.iterator().hasNext())
children.iterator().next().write( out );
}
public void readFields( DataInput in ) throws IOException
{
data = in.readUTF();
level = in.readInt();
int size = in.readInt();
for ( int i = 0; i < size; i++ )
children.add( TreeWritable.read( in ) );
}
public static TreeWritable read( DataInput in ) throws IOException
{
TreeWritable w = new TreeWritable();
w.readFields( in );
return w;
}
}
答案 0 :(得分:1)
我认为这是小树的最佳实现,如果它意味着在单机上处理它。如果您正在使用大型树,则应将其拆分为树部件,并将其存储为元组(例如,id,data,root_id)。
另一个例子,MapReduce中PageRank评估的数据结构是(url,currentPageRank,[link_url1,link_url2,...])