JSON到SSTable工具的内存不足故障

时间:2014-03-11 02:43:51

标签: java json cassandra

随Cassandra 1.2.15提供的json2sstable工具失败,出现内存不足错误。早在2011年,类似的问题被报告为bug并已修复:https://issues.apache.org/jira/browse/CASSANDRA-2189

我错过了工具配置/使用中的一些步骤,或者错误已经重新出现。请指出我缺少的东西。

Repro步骤:

1)Cassandra 1.2.15,一个带有varchar密钥的表和一个填充了随机uuids的varchar列,6x10 ^ 6条记录。

2)使用sstable2json工具(~1G)生成JSON文件。

3)Cassandra重新启动了新配置(新数据/缓存/提交目录,新分区程序)

4)Keyspace重新创建

5)json2sstable在处理几分钟后失败:

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
    at java.util.Arrays.copyOfRange(Arrays.java:2694)
    at java.lang.String.<init>(String.java:203)
    at org.codehaus.jackson.util.TextBuffer.contentsAsString(TextBuffer.java:350)
    at org.codehaus.jackson.impl.Utf8StreamParser.getText(Utf8StreamParser.java:278)
    at org.codehaus.jackson.map.deser.std.UntypedObjectDeserializer.deserialize(UntypedObjectDeserializer.java:59)
    at org.codehaus.jackson.map.deser.std.UntypedObjectDeserializer.mapArray(UntypedObjectDeserializer.java:165)
    at org.codehaus.jackson.map.deser.std.UntypedObjectDeserializer.deserialize(UntypedObjectDeserializer.java:51)
    at org.codehaus.jackson.map.deser.std.UntypedObjectDeserializer.mapArray(UntypedObjectDeserializer.java:165)
    at org.codehaus.jackson.map.deser.std.UntypedObjectDeserializer.deserialize(UntypedObjectDeserializer.java:51)
    at org.codehaus.jackson.map.deser.std.UntypedObjectDeserializer.mapObject(UntypedObjectDeserializer.java:204)
    at org.codehaus.jackson.map.deser.std.UntypedObjectDeserializer.deserialize(UntypedObjectDeserializer.java:47)
    at org.codehaus.jackson.map.deser.std.ObjectArrayDeserializer.deserialize(ObjectArrayDeserializer.java:104)
    at org.codehaus.jackson.map.deser.std.ObjectArrayDeserializer.deserialize(ObjectArrayDeserializer.java:18)
    at org.codehaus.jackson.map.ObjectMapper._readValue(ObjectMapper.java:2695)
    at org.codehaus.jackson.map.ObjectMapper.readValue(ObjectMapper.java:1294)
    at org.codehaus.jackson.JsonParser.readValueAs(JsonParser.java:1368)
    at org.apache.cassandra.tools.SSTableImport.importUnsorted(SSTableImport.java:344)
    at org.apache.cassandra.tools.SSTableImport.importJson(SSTableImport.java:328)
    at org.apache.cassandra.tools.SSTableImport.main(SSTableImport.java:547)

1 个答案:

答案 0 :(得分:0)

从json2sstable源代码,该工具将json文件中的所有记录加载到内存中,并按键对记录进行排序:

        private int importUnsorted(String jsonFile, ColumnFamily columnFamily, String ssTablePath, IPartitioner<?> partitioner) throws IOException
        {
            int importedKeys = 0;
            long start = System.currentTimeMillis();

            JsonParser parser = getParser(jsonFile);

            Object[] data = parser.readValueAs(new TypeReference<Object[]>(){});

            keyCountToImport = (keyCountToImport == null) ? data.length : keyCountToImport;
            SSTableWriter writer = new SSTableWriter(ssTablePath, keyCountToImport);

            System.out.printf("Importing %s keys...%n", keyCountToImport);

            // sort by dk representation, but hold onto the hex version
            SortedMap<DecoratedKey,Map<?, ?>> decoratedKeys = new TreeMap<DecoratedKey,Map<?, ?>>();

            for (Object row : data)
            {
                Map<?,?> rowAsMap = (Map<?, ?>)row;
                decoratedKeys.put(partitioner.decorateKey( hexToBytes((String)rowAsMap.get("key"))), rowAsMap);
....

根据乔纳森·伊利斯的说法&#39; CASSANDRA-2322 issue中的评论行为是设计的。

因此json2sstable不太适合将生产规模数据导入Cassandra。该工具可能会在大型数据集上崩溃。