如何在hadoop中序列化对象(在HDFS中)

时间:2016-05-31 12:20:06

标签: java hadoop serialization mapreduce bigdata

我有一个HashMap< String,ArrayList<整数>取代。我想将我的HashMap对象(hmap)序列化为HDFS位置,然后在Mapper和Reducers中对其进行反序列化以便使用它。

要在HDFS上序列化我的HashMap对象,我使用了普通的java对象序列化代码,如下所示但是出错了(权限被拒绝)

import java.awt.Dimension;

public class JTextPaneScroll extends javax.swing.JFrame {

    private javax.swing.JTextPane textPane;
    private javax.swing.JScrollPane scrollPane;

    private String s = ""
        + "Wait! Some of your past questions have not been well-received, and you're in danger of being blocked from asking any more.\n"
        + "For help formulating a clear, useful question, see: How do I ask a good question?\n"
        + "Also, edit your previous questions to improve formatting and clarity.\n"
        + "Wait! Some of your past questions have not been well-received, and you're in danger of being blocked from asking any more.\n"
        + "For help formulating a clear, useful question, see: How do I ask a good question?\n"
        + "Also, edit your previous questions to improve formatting and clarity.\n"
        + "Wait! Some of your past questions have not been well-received, and you're in danger of being blocked from asking any more.\n"
        + "For help formulating a clear, useful question, see: How do I ask a good question?\n"
        + "Also, edit your previous questions to improve formatting and clarity.\n";

    public JTextPaneScroll() {
        initComponents();
        setTextToPane();
    }

    private void initComponents() {
        scrollPane = new javax.swing.JScrollPane();
        textPane = new javax.swing.JTextPane() {
            @Override
            public Dimension getPreferredScrollableViewportSize() {
                return new Dimension(290, 192);
            }
        };
        setDefaultCloseOperation(javax.swing.WindowConstants.EXIT_ON_CLOSE);
        scrollPane.setViewportView(textPane);
        add(scrollPane);
        pack();
    }

    private void setTextToPane() {
        try {
            textPane.setText(s);
        } catch (Exception ex) {
            ex.printStackTrace();
        }
    }

    public static void main(String args[]) {
        try {
            javax.swing.UIManager.setLookAndFeel(javax.swing.UIManager.getSystemLookAndFeelClassName());
        } catch (ClassNotFoundException | InstantiationException | IllegalAccessException | javax.swing.UnsupportedLookAndFeelException ex) {
            java.util.logging.Logger.getLogger(JTextPaneScroll.class.getName()).log(java.util.logging.Level.SEVERE, null, ex);
        }

        java.awt.EventQueue.invokeLater(new Runnable() {
            public void run() {
                new JTextPaneScroll().setVisible(true);
            }
        });
    }
}

我收到了以下异常

try
        {
            FileOutputStream fileOut =new FileOutputStream("hashmap.ser");
            ObjectOutputStream out = new ObjectOutputStream(fileOut);
            out.writeObject(hm);
            out.close();

        }
        catch(Exception e)
        {
            e.printStackTrace();
        }

有人可以建议或分享如何在hdfs上的hadoop中序列化对象的示例代码吗?

1 个答案:

答案 0 :(得分:2)

请尝试使用Apache Commons Lang中的SerializationUtils

以下是方法

static Object   clone(Serializable object)  //Deep clone an Object using serialization.
static Object   deserialize(byte[] objectData) //Deserializes a single Object from an array of bytes.
static Object   deserialize(InputStream inputStream)  //Deserializes an Object from the specified stream.
static byte[]   serialize(Serializable obj) //Serializes an Object to a byte array for storage/serialization.
static void serialize(Serializable obj, OutputStream outputStream) //Serializes an Object to the specified stream.

存储到HDFS时,您可以存储从序列化返回的byte[]。 在获取对象时,您可以为ex:File对象键入强制转换为相应的对象,然后可以将其取回。

在我的情况下,我在Hbase列中存储了一个hashmap,我将其检索回来,在我的mapper方法中作为Hashmap,因为它是成功的。

当然,你也可以用同样的方式做到这一点......

另一件事是你也可以使用Apache Commons IO refer thisorg.apache.commons.io.FileUtils); 但稍后您需要将此文件复制到HDFS。因为你想要HDFS作为数据存储区。

FileUtils.writeByteArrayToFile(new File("pathname"), myByteArray);

注意:两个jar apache commons io和apache commons lang总是在hadoop集群中可用。