我正在用Java编写Random Indexing实现,它需要处理大型语料库并以某种方式存储单个令牌的上下文和索引向量。 HashMap看起来很自然(String - > Token对象),但是当运行Xprof时,似乎不成比例的大部分处理都是将令牌添加到HashMap中。
我是否正确阅读了输出?为什么会这样,有什么方法让我加快速度?
Flat profile of 16.18 secs (606 total ticks): main
Interpreted + native Method
6.9% 0 + 42 java.io.FileInputStream.readBytes
5.0% 0 + 30 java.lang.Object.getClass
1.8% 11 + 0 java.lang.String.toLowerCase
1.5% 9 + 0 java.util.HashMap.resize
1.3% 8 + 0 opennlp.tools.tokenize.AbstractTokenizer.tokenize
1.3% 0 + 8 java.util.zip.ZipFile.read
1.2% 0 + 7 java.util.zip.ZipFile.open
0.8% 5 + 0 java.util.Arrays.copyOfRange
0.5% 0 + 3 java.io.FileInputStream.available
0.3% 2 + 0 java.util.HashMap.put
0.3% 0 + 2 sun.misc.Unsafe.compareAndSwapLong
0.3% 2 + 0 java.lang.CharacterDataLatin1.toLowerCase
0.3% 2 + 0 java.util.ArrayList.grow
0.3% 2 + 0 semanticspace.SparseVector.get
0.3% 2 + 0 java.lang.CharacterData.of
0.2% 1 + 0 java.util.HashMap.createEntry
0.2% 1 + 0 java.util.Arrays.copyOf
0.2% 1 + 0 java.lang.Integer.valueOf
0.2% 1 + 0 java.lang.Integer.toString
0.2% 1 + 0 sun.misc.JarIndex.addToList
0.2% 1 + 0 java.util.ArrayList.toArray
0.2% 1 + 0 java.net.URL.toString
0.2% 1 + 0 semanticspace.SparseVector.add
0.2% 1 + 0 sun.reflect.NativeMethodAccessorImpl.invoke0
0.2% 1 + 0 java.io.BufferedInputStream.read1
26.2% 65 + 94 Total interpreted (including elided)
Compiled + native Method
36.5% 217 + 4 java.util.HashMap.put
24.3% 133 + 14 semanticspace.SparseVector.add
2.6% 15 + 1 semanticspace.RandomIndexing.getToken
1.3% 8 + 0 java.lang.String.toLowerCase
1.3% 8 + 0 semanticspace.RandomIndexing.read
0.5% 0 + 3 java.util.HashMap.newKeyIterator
0.2% 0 + 1 semanticspace.SparseVector.get
0.2% 1 + 0 java.util.HashMap.containsKey
66.8% 382 + 23 Total compiled
Stub + native Method
6.9% 0 + 42 java.lang.System.arraycopy
6.9% 0 + 42 Total stub
Flat profile of 0.00 secs (1 total ticks): DestroyJavaVM
Thread-local ticks:
100.0% 1 Blocked (of total)
Flat profile of 16.17 secs (608 total ticks): Monitor Ctrl-Break
Interpreted + native Method
98.2% 0 + 597 java.net.PlainSocketImpl.socketAccept
1.0% 0 + 6 java.net.PlainSocketImpl.initProto
0.7% 0 + 4 java.net.NetworkInterface.getAll
0.2% 0 + 1 java.lang.ClassLoader$NativeLibrary.load
100.0% 0 + 608 Total interpreted
Global summary of 16.33 seconds:
100.0% 1326 Received ticks
53.2% 706 Received GC ticks
6.8% 90 Compilation
0.1% 1 Other VM operations
答案 0 :(得分:3)
我不了解Xprof,但您可以尝试使用visualvm进行分析,并查看它调用HashMap.put的次数,每次和总共需要多长时间。而且它可能对瓶颈的位置有不同的描述。
答案 1 :(得分:2)
您的问题中没有太多信息可以进行正确的分析,但根据我所看到的内容,您只需阅读文件并将文字放入HashMap。
鉴于这是程序所做的唯一事情,人们不应该对它花费大部分时间阅读文件和更新HashMap感到惊讶。