Question

我想构建一个包含字典的列表/数组。每个字典都包含一个整数作为键和一个（可能非常长）的int数组。我已使用python在numpy上实施了此操作，如下所示：

def get_dicts(dict_names):

    dictionaries = [None]*len(dict_names)
    k = 0
    my_dict = {}
    for i in dict_names:
        local_dict = my_dict.copy()
        with open(i, 'rt') as f:
            for line in f:
                v = np.fromstring(line, dtype=int, sep=' ')
                local_dict[v[0]] = v[1:]

        dictionaries[k] = local_dict
        k += 1
        print "Dictionary %s extracted" % i
    return dictionaries

def main():
     dict_names = [str(i) + "_tweet_mapping" for i in range(1, 45)]
     dictionaries = get_dicts(dict_names)

运行时没问题：90秒。但是，python在我的问题后期太慢了，我将所有内容移植到java。在java中，在ListArray HashMaps heap中构建这些词典会占用大量内存，甚至会遇到private ArrayList<Hashtable<Integer, Integer[]>> get_dicts(String [] dictionary_files) { ArrayList<Hashtable<Integer, Integer []>> my_dictionaries = new ArrayList<Hashtable<Integer,Integer []>>(dictionary_files.length); for (int i=0; i<dictionary_files.length; i++) { my_dictionaries.add(get_one_dict(dictionary_files[i])); } return my_dictionaries; } private Hashtable<Integer, Integer []> get_one_dict(String dictionary_file){ Hashtable<Integer, Integer []> my_dictionary = new Hashtable<Integer, Integer[]>(); try{ BufferedReader br = new BufferedReader(new FileReader(dictionary_file)); try{ String s; while((s = br.readLine()) != null){ String [] words = s.split(" "); int n_tweets = words.length-1; Integer [] int_line = new Integer[n_tweets]; int key_word = Integer.parseInt(words[0]); for (int j=0; j<n_tweets; j++){ int_line[j] = Integer.parseInt(words[j+1]); } my_dictionary.put(key_word, int_line); } }finally{ br.close(); } } catch(IOException e){ e.printStackTrace(); }catch(OutOfMemoryError e){ e.printStackTrace(); }catch(Exception e){ e.printStackTrace(); } System.out.println("Dictionary " + dictionary_file +" extracted"); return my_dictionary; }的问题。运行时也慢得多。我的java实现如下：

{{1}}

为什么在时间和记忆方面都存在巨大的性能差异。我该怎么做才能减少java中的内存消耗？

Answer 1

您使用的是包装类型Integer而不是int。对于地图键，您别无选择，但对于您所使用的阵列。

使用Map<Integer, int[]>会将每个元素的内存消耗从4 + 16字节减少到4字节。（*）

您还应该忘记Hashtable并使用HashMap代替。前者是同步的，你不需要。但这不应该是一个大问题。

我认为减速主要来自不必要的内存分配。

（*）4（在没有压缩OOPS的64位JVM上为8）作为参考，16为对象（这是最小大小）。

用于存储字典python vs java的内存

1 个答案: