实现Tree Writable类

时间:2011-03-05 13:32:45

标签: java serialization hadoop mapreduce

我想实现一个TreeWritable类来表示Tree结构。 我已经尝试了以下实现,但我得到了mapred.MapTask:记录太大而不能用于内存缓冲区错误。 我应该如何为多级数据结构实现Writable?

public class TreeWritable implements Writable
{
private final Set<TreeWritable> children = new LinkedHashSet<TreeWritable>();
private String data;
private int level;

public TreeWritable( String data, int level )
{
    this.data = data;
    this.level = level;
}

public int getLevel()
{
    return level;
}

public TreeWritable()
{
}

public TreeWritable child( String data )
{
    for ( TreeWritable child : children )
    {
        if ( child.data.equals( data ) )
        {
            return child;
        }
    }
    return child( new TreeWritable( data, this.level + 1 ) );
}

TreeWritable child( TreeWritable child )
{
    children.add( child );
    return child;
}

public Set<TreeWritable> getChildren()
{
    return children;
}

public String getId()
{
    return data;
}

public void write( DataOutput out ) throws IOException
{
    out.writeUTF( data );
    out.write( level );
    int size = children.size();
    out.writeInt( size );        
    while(children.iterator().hasNext())
        children.iterator().next().write( out );
}

public void readFields( DataInput in ) throws IOException
{
    data = in.readUTF();
    level = in.readInt();
    int size = in.readInt();
    for ( int i = 0; i < size; i++ )
        children.add( TreeWritable.read( in ) );
}

public static TreeWritable read( DataInput in ) throws IOException
{
    TreeWritable w = new TreeWritable();
    w.readFields( in );
    return w;
}
}

1 个答案:

答案 0 :(得分:1)

我认为这是小树的最佳实现,如果它意味着在单机上处理它。如果您正在使用大型树,则应将其拆分为树部件,并将其存储为元组(例如,id,data,root_id)。

另一个例子,MapReduce中PageRank评估的数据结构是(url,currentPageRank,[link_url1,link_url2,...])