将20万个实体推入java地图对象的最佳方法是什么?
我相信这两项任务都在两个不同的核心中运行。 问题:当我创建一个推送1000万个数据的任务时,需要大约9秒,然后当运行2个任务时,每个任务都会推送1000万个数据,为什么需要大约26秒?我做错了吗?
在不到10秒的时间内插入20 M数据是否有不同的解决方案?
答案 0 :(得分:0)
在没有看到您的代码的情况下,这些糟糕性能结果的最可能原因是垃圾收集活动。为了证明这一点,我写了以下程序:
import java.lang.management.ManagementFactory;
import java.util.*;
import java.util.concurrent.*;
public class TestMap {
// we assume NB_ENTITIES is divisible by NB_TASKS
static final int NB_ENTITIES = 20_000_000, NB_TASKS = 2;
static Map<String, String> map = new ConcurrentHashMap<>();
public static void main(String[] args) {
try {
System.out.printf("running with nb entities = %,d, nb tasks = %,d, VM args = %s%n", NB_ENTITIES, NB_TASKS, ManagementFactory.getRuntimeMXBean().getInputArguments());
ExecutorService executor = Executors.newFixedThreadPool(NB_TASKS);
int entitiesPerTask = NB_ENTITIES / NB_TASKS;
List<Future<?>> futures = new ArrayList<>(NB_TASKS);
long startTime = System.nanoTime();
for (int i=0; i<NB_TASKS; i++) {
MyTask task = new MyTask(i * entitiesPerTask, (i + 1) * entitiesPerTask - 1);
futures.add(executor.submit(task));
}
for (Future<?> f: futures) {
f.get();
}
long elapsed = System.nanoTime() - startTime;
executor.shutdownNow();
System.gc();
Runtime rt = Runtime.getRuntime();
long usedMemory = rt.maxMemory() - rt.freeMemory();
System.out.printf("processing completed in %,d ms, usedMemory after GC = %,d bytes%n", elapsed/1_000_000L, usedMemory);
} catch (Exception e) {
e.printStackTrace();
}
}
static class MyTask implements Runnable {
private final int startIdx, endIdx;
public MyTask(final int startIdx, final int endIdx) {
this.startIdx = startIdx;
this.endIdx = endIdx;
}
@Override
public void run() {
long startTime = System.nanoTime();
for (int i=startIdx; i<=endIdx; i++) {
map.put("sambit:rout:" + i, "C:\\Images\\Provision_Images");
}
long elapsed = System.nanoTime() - startTime;
System.out.printf("task[%,d - %,d], completed in %,d ms%n", startIdx, endIdx, elapsed/1_000_000L);
}
}
}
在处理结束时,此代码通过紧跟System.gc()
之后执行Runtime.maxMemory() - Runtime.freeMemory()
来计算已用内存的近似值。这表明具有2000万条目的地图大约略低于2.2 GB,这是相当可观的。我用1和2个线程运行它,对于-Xmx和-Xms JVM参数的各种值,这里是结果输出(只是要清楚:2560m = 2.5g):
running with nb entities = 20,000,000, nb tasks = 1, VM args = [-Xms2560m, -Xmx2560m]
task[0 - 19,999,999], completed in 11,781 ms
processing completed in 11,782 ms, usedMemory after GC = 2,379,068,760 bytes
running with nb entities = 20,000,000, nb tasks = 2, VM args = [-Xms2560m, -Xmx2560m]
task[0 - 9,999,999], completed in 8,269 ms
task[10,000,000 - 19,999,999], completed in 12,385 ms
processing completed in 12,386 ms, usedMemory after GC = 2,379,069,480 bytes
running with nb entities = 20,000,000, nb tasks = 1, VM args = [-Xms3g, -Xmx3g]
task[0 - 19,999,999], completed in 12,525 ms
processing completed in 12,527 ms, usedMemory after GC = 2,398,339,944 bytes
running with nb entities = 20,000,000, nb tasks = 2, VM args = [-Xms3g, -Xmx3g]
task[0 - 9,999,999], completed in 12,220 ms
task[10,000,000 - 19,999,999], completed in 12,264 ms
processing completed in 12,265 ms, usedMemory after GC = 2,382,777,776 bytes
running with nb entities = 20,000,000, nb tasks = 1, VM args = [-Xms4g, -Xmx4g]
task[0 - 19,999,999], completed in 7,363 ms
processing completed in 7,364 ms, usedMemory after GC = 2,402,467,040 bytes
running with nb entities = 20,000,000, nb tasks = 2, VM args = [-Xms4g, -Xmx4g]
task[0 - 9,999,999], completed in 5,466 ms
task[10,000,000 - 19,999,999], completed in 5,511 ms
processing completed in 5,512 ms, usedMemory after GC = 2,381,821,576 bytes
running with nb entities = 20,000,000, nb tasks = 1, VM args = [-Xms8g, -Xmx8g]
task[0 - 19,999,999], completed in 7,778 ms
processing completed in 7,779 ms, usedMemory after GC = 2,438,159,312 bytes
running with nb entities = 20,000,000, nb tasks = 2, VM args = [-Xms8g, -Xmx8g]
task[0 - 9,999,999], completed in 5,739 ms
task[10,000,000 - 19,999,999], completed in 5,784 ms
processing completed in 5,785 ms, usedMemory after GC = 2,396,478,680 bytes
这些结果可归纳在下表中:
--------------------------------
heap | exec time (ms) for:
size (gb) | 1 thread | 2 threads
--------------------------------
2.5 | 11782 | 12386
3.0 | 12527 | 12265
4.0 | 7364 | 5512
8.0 | 7779 | 5785
--------------------------------
我还观察到,对于2.5g和3g堆大小,由于GC活动,在整个处理时间内CPU活动很高,峰值为100%,而对于4g和8g,仅观察到最后由于System.gc()
电话。
总结:
如果您的堆的大小不合适,垃圾收集将会消除您希望获得的任何性能提升。你应该把它做得足够大,以避免长时间GC暂停的副作用。
您还必须意识到使用ConcurrentHashMap
等并发集合会产生很大的性能开销。为了说明这一点,我稍微修改了代码,以便每个任务使用自己的HashMap
,然后在第一个任务的映射中聚合所有映射(使用Map.putAll()
)。处理时间降至3200毫秒左右
答案 1 :(得分:0)
添加可能需要一个CPU周期,因此如果您的CPU以3GHz运行,则为0.3纳秒。做20M次,变为6000000纳秒或6毫秒。因此,您的测量更多地受到启动线程,线程切换,JIT编译等的开销的影响,而不是您的操作 试图衡量。
垃圾收集也可能起作用,因为它可能会减慢你的速度。
我建议您使用专门的库进行微基准测试,例如jmh。
感谢framework (see examp!e)帖子帮我写了回复
答案 2 :(得分:0)
虽然我没有尝试过多个线程,但确实尝试了Java 11提供的10种类型中的全部7种合适的Map
类型。
我的结果都大大快于您报告的25到40秒。对于7种地图类别中的 any ,我对< String , UUID >
的20,000,000个条目的结果更像是3-9秒。
我正在使用Java 13:
Model Name: Mac mini
Model Identifier: Macmini8,1
Processor Name: Intel Core i5
Processor Speed: 3 GHz
Number of Processors: 1
Total Number of Cores: 6
L2 Cache (per Core): 256 KB
L3 Cache: 9 MB
Memory: 32 GB
准备中。
瞬间大小:20000000
uuid的大小:20000000
运行测试。
java.util.HashMap采用:PT3.645250368S
java.util.WeakHashMap采用:PT3.199812894S
java.util.TreeMap采用:PT8.97788412S
java.util.concurrent.ConcurrentSkipListMap采用:PT7.347253106S
java.util.concurrent.ConcurrentHashMap采用:PT4.494560252S
java.util.LinkedHashMap采用:PT2.78054883S
java.util.IdentityHashMap采用:PT5.608737472S
我的代码:
System.out.println( "Preparing." );
int limit = 20_000_000; // 20_000_000
Set < String > instantsSet = new TreeSet <>(); // Use `Set` to forbid duplicates.
List < UUID > uuids = new ArrayList <>( limit );
while ( instantsSet.size() < limit )
{
instantsSet.add( Instant.now().toString() );
}
List < String > instants = new ArrayList <>( instantsSet );
for ( int i = 0 ; i < limit ; i++ )
{
uuids.add( UUID.randomUUID() );
}
System.out.println( "size of instants: " + instants.size() );
System.out.println( "size of uuids: " + uuids.size() );
System.out.println( "Running test." );
// Using 7 of the 10 `Map` implementations bundled with Java 11.
// Omitting `EnumMap`, as it requires enums for the key.
// Omitting `Map.of` because it is for literals.
// Omitting `HashTable` because it is outmoded, replaced by `ConcurrentHashMap`.
List < Map < String, UUID > > maps = List.of(
new HashMap <>( limit ) ,
new WeakHashMap <>( limit ) ,
new TreeMap <>() ,
new ConcurrentSkipListMap <>() ,
new ConcurrentHashMap <>( limit ) ,
new LinkedHashMap <>( limit ) ,
new IdentityHashMap <>( limit )
);
for ( Map < String, UUID > map : maps )
{
long start = System.nanoTime();
for ( int i = 0 ; i < instants.size() ; i++ )
{
map.put( instants.get( i ) , uuids.get( i ) );
}
long stop = System.nanoTime();
Duration d = Duration.of( stop - start , ChronoUnit.NANOS );
System.out.println( map.getClass().getName() + " took: " + d );
// Free up memory.
map = null;
System.gc(); // Request garbage collector do its thing. No guarantee!
try
{
Thread.sleep( TimeUnit.SECONDS.toMillis( 4 ) ); // Wait for garbage collector to hopefully finish. No guarantee!
}
catch ( InterruptedException e )
{
e.printStackTrace();
}
}
System.out.println("Done running test.");
这是我写的一张表格,比较了各种Map
的实现。