自2016年初以来,我一直致力于为Go实现Pandas / R DataFrame实现:https://github.com/kniren/gota。
最近,我一直专注于提高库的性能,以尝试匹配Pandas / Dplyr。您可以在此处跟踪进度:https://github.com/kniren/gota/issues/16
由于其中一个更常用的操作是DataFrame子集,我认为引入并发以尝试提高系统性能可能是个好主意。
在:
columns := make([]series.Series, df.ncols)
for i, column := range df.columns {
s := column.Subset(indexes)
columns[i] = s
}
后:
columns := make([]series.Series, df.ncols)
var wg sync.WaitGroup
wg.Add(df.ncols)
for i := range df.columns {
go func(i int) {
columns[i] = df.columns[i].Subset(indexes)
wg.Done()
}(i)
}
wg.Wait()
据我所知,为DataFrame的每一列创建一个goroutine不应该引入太多开销,所以我期望相对于串行版本至少获得x2加速(至少对于大型数据集) 。但是,在使用不同大小的数据集和索引对此更改进行基准测试时,结果非常令人失望(NROWSxNCOLS_INDEXSIZE-CPUCORES):
benchmark old ns/op new ns/op delta
BenchmarkDataFrame_Subset/1000000x20_100 55230 109349 +97.99%
BenchmarkDataFrame_Subset/1000000x20_100-2 51457 67714 +31.59%
BenchmarkDataFrame_Subset/1000000x20_100-4 49845 70141 +40.72%
BenchmarkDataFrame_Subset/1000000x20_1000 518506 518085 -0.08%
BenchmarkDataFrame_Subset/1000000x20_1000-2 476661 311379 -34.67%
BenchmarkDataFrame_Subset/1000000x20_1000-4 505023 316583 -37.31%
BenchmarkDataFrame_Subset/1000000x20_10000 6621116 6314112 -4.64%
BenchmarkDataFrame_Subset/1000000x20_10000-2 7316062 4509601 -38.36%
BenchmarkDataFrame_Subset/1000000x20_10000-4 6483812 8394113 +29.46%
BenchmarkDataFrame_Subset/1000000x20_100000 105341711 106427967 +1.03%
BenchmarkDataFrame_Subset/1000000x20_100000-2 94567729 56778647 -39.96%
BenchmarkDataFrame_Subset/1000000x20_100000-4 91896690 60971444 -33.65%
BenchmarkDataFrame_Subset/1000000x20_1000000 1538680081 1632044752 +6.07%
BenchmarkDataFrame_Subset/1000000x20_1000000-2 1292113119 1100075806 -14.86%
BenchmarkDataFrame_Subset/1000000x20_1000000-4 1282367864 949615298 -25.95%
BenchmarkDataFrame_Subset/100000x20_100 50286 106850 +112.48%
BenchmarkDataFrame_Subset/100000x20_100-2 54537 70492 +29.26%
BenchmarkDataFrame_Subset/100000x20_100-4 58024 76617 +32.04%
BenchmarkDataFrame_Subset/100000x20_1000 541600 625967 +15.58%
BenchmarkDataFrame_Subset/100000x20_1000-2 493894 362894 -26.52%
BenchmarkDataFrame_Subset/100000x20_1000-4 535373 349211 -34.77%
BenchmarkDataFrame_Subset/100000x20_10000 6298063 7678499 +21.92%
BenchmarkDataFrame_Subset/100000x20_10000-2 5827185 4832560 -17.07%
BenchmarkDataFrame_Subset/100000x20_10000-4 8195048 3660077 -55.34%
BenchmarkDataFrame_Subset/100000x20_100000 105108807 82976477 -21.06%
BenchmarkDataFrame_Subset/100000x20_100000-2 92112736 58317114 -36.69%
BenchmarkDataFrame_Subset/100000x20_100000-4 92044966 63469935 -31.04%
BenchmarkDataFrame_Subset/1000x20_10 9741 53365 +447.84%
BenchmarkDataFrame_Subset/1000x20_10-2 9366 36457 +289.25%
BenchmarkDataFrame_Subset/1000x20_10-4 9463 46682 +393.31%
BenchmarkDataFrame_Subset/1000x20_100 50841 103523 +103.62%
BenchmarkDataFrame_Subset/1000x20_100-2 49972 62344 +24.76%
BenchmarkDataFrame_Subset/1000x20_100-4 72014 81808 +13.60%
BenchmarkDataFrame_Subset/1000x20_1000 457799 571292 +24.79%
BenchmarkDataFrame_Subset/1000x20_1000-2 460551 405116 -12.04%
BenchmarkDataFrame_Subset/1000x20_1000-4 462928 416522 -10.02%
BenchmarkDataFrame_Subset/1000x200_10 90125 688443 +663.88%
BenchmarkDataFrame_Subset/1000x200_10-2 85259 392705 +360.60%
BenchmarkDataFrame_Subset/1000x200_10-4 87412 387509 +343.31%
BenchmarkDataFrame_Subset/1000x200_100 486600 1082901 +122.54%
BenchmarkDataFrame_Subset/1000x200_100-2 471154 732304 +55.43%
BenchmarkDataFrame_Subset/1000x200_100-4 542846 659571 +21.50%
BenchmarkDataFrame_Subset/1000x200_1000 5926086 6686480 +12.83%
BenchmarkDataFrame_Subset/1000x200_1000-2 5364091 3986970 -25.67%
BenchmarkDataFrame_Subset/1000x200_1000-4 5904977 4504084 -23.72%
BenchmarkDataFrame_Subset/1000x2000_10 1187297 7800052 +556.96%
BenchmarkDataFrame_Subset/1000x2000_10-2 1217022 3930742 +222.98%
BenchmarkDataFrame_Subset/1000x2000_10-4 1301666 3617871 +177.94%
BenchmarkDataFrame_Subset/1000x2000_100 6942015 10790196 +55.43%
BenchmarkDataFrame_Subset/1000x2000_100-2 6588351 7592847 +15.25%
BenchmarkDataFrame_Subset/1000x2000_100-4 7067226 14391327 +103.63%
BenchmarkDataFrame_Subset/1000x2000_1000 62392457 69560711 +11.49%
BenchmarkDataFrame_Subset/1000x2000_1000-2 57793006 37416703 -35.26%
BenchmarkDataFrame_Subset/1000x2000_1000-4 59572261 58398203 -1.97%
benchmark old allocs new allocs delta
BenchmarkDataFrame_Subset/1000000x20_100 41 42 +2.44%
BenchmarkDataFrame_Subset/1000000x20_100-2 41 42 +2.44%
BenchmarkDataFrame_Subset/1000000x20_100-4 41 42 +2.44%
BenchmarkDataFrame_Subset/1000000x20_1000 41 42 +2.44%
BenchmarkDataFrame_Subset/1000000x20_1000-2 41 42 +2.44%
BenchmarkDataFrame_Subset/1000000x20_1000-4 41 42 +2.44%
BenchmarkDataFrame_Subset/1000000x20_10000 41 42 +2.44%
BenchmarkDataFrame_Subset/1000000x20_10000-2 41 42 +2.44%
BenchmarkDataFrame_Subset/1000000x20_10000-4 41 42 +2.44%
BenchmarkDataFrame_Subset/1000000x20_100000 41 42 +2.44%
BenchmarkDataFrame_Subset/1000000x20_100000-2 41 42 +2.44%
BenchmarkDataFrame_Subset/1000000x20_100000-4 41 42 +2.44%
BenchmarkDataFrame_Subset/1000000x20_1000000 41 42 +2.44%
BenchmarkDataFrame_Subset/1000000x20_1000000-2 41 43 +4.88%
BenchmarkDataFrame_Subset/1000000x20_1000000-4 41 46 +12.20%
BenchmarkDataFrame_Subset/100000x20_100 41 42 +2.44%
BenchmarkDataFrame_Subset/100000x20_100-2 41 42 +2.44%
BenchmarkDataFrame_Subset/100000x20_100-4 41 42 +2.44%
BenchmarkDataFrame_Subset/100000x20_1000 41 42 +2.44%
BenchmarkDataFrame_Subset/100000x20_1000-2 41 42 +2.44%
BenchmarkDataFrame_Subset/100000x20_1000-4 41 42 +2.44%
BenchmarkDataFrame_Subset/100000x20_10000 41 42 +2.44%
BenchmarkDataFrame_Subset/100000x20_10000-2 41 42 +2.44%
BenchmarkDataFrame_Subset/100000x20_10000-4 41 42 +2.44%
BenchmarkDataFrame_Subset/100000x20_100000 41 42 +2.44%
BenchmarkDataFrame_Subset/100000x20_100000-2 41 42 +2.44%
BenchmarkDataFrame_Subset/100000x20_100000-4 41 42 +2.44%
BenchmarkDataFrame_Subset/1000x20_10 41 42 +2.44%
BenchmarkDataFrame_Subset/1000x20_10-2 41 42 +2.44%
BenchmarkDataFrame_Subset/1000x20_10-4 41 42 +2.44%
BenchmarkDataFrame_Subset/1000x20_100 41 42 +2.44%
BenchmarkDataFrame_Subset/1000x20_100-2 41 42 +2.44%
BenchmarkDataFrame_Subset/1000x20_100-4 41 42 +2.44%
BenchmarkDataFrame_Subset/1000x20_1000 41 42 +2.44%
BenchmarkDataFrame_Subset/1000x20_1000-2 41 42 +2.44%
BenchmarkDataFrame_Subset/1000x20_1000-4 41 42 +2.44%
BenchmarkDataFrame_Subset/1000x200_10 401 402 +0.25%
BenchmarkDataFrame_Subset/1000x200_10-2 401 402 +0.25%
BenchmarkDataFrame_Subset/1000x200_10-4 401 402 +0.25%
BenchmarkDataFrame_Subset/1000x200_100 401 402 +0.25%
BenchmarkDataFrame_Subset/1000x200_100-2 401 402 +0.25%
BenchmarkDataFrame_Subset/1000x200_100-4 401 402 +0.25%
BenchmarkDataFrame_Subset/1000x200_1000 401 402 +0.25%
BenchmarkDataFrame_Subset/1000x200_1000-2 401 402 +0.25%
BenchmarkDataFrame_Subset/1000x200_1000-4 401 402 +0.25%
BenchmarkDataFrame_Subset/1000x2000_10 4001 4002 +0.02%
BenchmarkDataFrame_Subset/1000x2000_10-2 4001 4002 +0.02%
BenchmarkDataFrame_Subset/1000x2000_10-4 4001 4002 +0.02%
BenchmarkDataFrame_Subset/1000x2000_100 4001 4002 +0.02%
BenchmarkDataFrame_Subset/1000x2000_100-2 4001 4002 +0.02%
BenchmarkDataFrame_Subset/1000x2000_100-4 4001 4002 +0.02%
BenchmarkDataFrame_Subset/1000x2000_1000 4001 4002 +0.02%
BenchmarkDataFrame_Subset/1000x2000_1000-2 4001 4010 +0.22%
BenchmarkDataFrame_Subset/1000x2000_1000-4 4001 4003 +0.05%
benchmark old bytes new bytes delta
BenchmarkDataFrame_Subset/1000000x20_100 32400 32416 +0.05%
BenchmarkDataFrame_Subset/1000000x20_100-2 32400 32416 +0.05%
BenchmarkDataFrame_Subset/1000000x20_100-4 32400 32416 +0.05%
BenchmarkDataFrame_Subset/1000000x20_1000 298880 298896 +0.01%
BenchmarkDataFrame_Subset/1000000x20_1000-2 298880 298896 +0.01%
BenchmarkDataFrame_Subset/1000000x20_1000-4 298880 298896 +0.01%
BenchmarkDataFrame_Subset/1000000x20_10000 2971520 2971536 +0.00%
BenchmarkDataFrame_Subset/1000000x20_10000-2 2971520 2971536 +0.00%
BenchmarkDataFrame_Subset/1000000x20_10000-4 2971520 2971536 +0.00%
BenchmarkDataFrame_Subset/1000000x20_100000 29083520 29083536 +0.00%
BenchmarkDataFrame_Subset/1000000x20_100000-2 29083520 29083547 +0.00%
BenchmarkDataFrame_Subset/1000000x20_100000-4 29083542 29083563 +0.00%
BenchmarkDataFrame_Subset/1000000x20_1000000 290121600 290121616 +0.00%
BenchmarkDataFrame_Subset/1000000x20_1000000-2 290121600 290121696 +0.00%
BenchmarkDataFrame_Subset/1000000x20_1000000-4 290121600 290121840 +0.00%
BenchmarkDataFrame_Subset/100000x20_100 32400 32416 +0.05%
BenchmarkDataFrame_Subset/100000x20_100-2 32400 32416 +0.05%
BenchmarkDataFrame_Subset/100000x20_100-4 32400 32416 +0.05%
BenchmarkDataFrame_Subset/100000x20_1000 298880 298896 +0.01%
BenchmarkDataFrame_Subset/100000x20_1000-2 298880 298896 +0.01%
BenchmarkDataFrame_Subset/100000x20_1000-4 298880 298896 +0.01%
BenchmarkDataFrame_Subset/100000x20_10000 2971520 2971536 +0.00%
BenchmarkDataFrame_Subset/100000x20_10000-2 2971520 2971536 +0.00%
BenchmarkDataFrame_Subset/100000x20_10000-4 2971520 2971536 +0.00%
BenchmarkDataFrame_Subset/100000x20_100000 29083520 29083536 +0.00%
BenchmarkDataFrame_Subset/100000x20_100000-2 29083520 29083536 +0.00%
BenchmarkDataFrame_Subset/100000x20_100000-4 29083542 29083553 +0.00%
BenchmarkDataFrame_Subset/1000x20_10 4880 4896 +0.33%
BenchmarkDataFrame_Subset/1000x20_10-2 4880 4896 +0.33%
BenchmarkDataFrame_Subset/1000x20_10-4 4880 4896 +0.33%
BenchmarkDataFrame_Subset/1000x20_100 32400 32416 +0.05%
BenchmarkDataFrame_Subset/1000x20_100-2 32400 32416 +0.05%
BenchmarkDataFrame_Subset/1000x20_100-4 32400 32416 +0.05%
BenchmarkDataFrame_Subset/1000x20_1000 298880 298896 +0.01%
BenchmarkDataFrame_Subset/1000x20_1000-2 298880 298896 +0.01%
BenchmarkDataFrame_Subset/1000x20_1000-4 298880 298896 +0.01%
BenchmarkDataFrame_Subset/1000x200_10 49568 49584 +0.03%
BenchmarkDataFrame_Subset/1000x200_10-2 49568 49584 +0.03%
BenchmarkDataFrame_Subset/1000x200_10-4 49568 49585 +0.03%
BenchmarkDataFrame_Subset/1000x200_100 324768 324784 +0.00%
BenchmarkDataFrame_Subset/1000x200_100-2 324768 324784 +0.00%
BenchmarkDataFrame_Subset/1000x200_100-4 324768 324784 +0.00%
BenchmarkDataFrame_Subset/1000x200_1000 2989568 2989584 +0.00%
BenchmarkDataFrame_Subset/1000x200_1000-2 2989568 2989584 +0.00%
BenchmarkDataFrame_Subset/1000x200_1000-4 2989569 2989588 +0.00%
BenchmarkDataFrame_Subset/1000x2000_10 491072 491088 +0.00%
BenchmarkDataFrame_Subset/1000x2000_10-2 491072 491133 +0.01%
BenchmarkDataFrame_Subset/1000x2000_10-4 491072 491088 +0.00%
BenchmarkDataFrame_Subset/1000x2000_100 3243072 3243088 +0.00%
BenchmarkDataFrame_Subset/1000x2000_100-2 3243074 3243102 +0.00%
BenchmarkDataFrame_Subset/1000x2000_100-4 3243076 3243100 +0.00%
BenchmarkDataFrame_Subset/1000x2000_1000 29891072 29891088 +0.00%
BenchmarkDataFrame_Subset/1000x2000_1000-2 29891086 29891797 +0.00%
BenchmarkDataFrame_Subset/1000x2000_1000-4 29891115 29891167 +0.00%
在此基准测试中运行探查器(cpu / mem)似乎没有显示出任何重要意义。并发版本似乎在rumtime.match_semaphore_signal
上花了一些时间,但我想在等待goroutines完成时会有所期待。
我已经尝试将发布的goroutine数量限制为runtime.GOMAXPROCS(0)
报告的最大核心数,但结果有些甚至更糟。我在这里做了一些可怕的错误,或者是goroutines的开销如此大,以至于它对性能有如此显着的影响?
答案 0 :(得分:0)
Goroutines很便宜,但不是免费的。
我没有阅读您的代码,但如果您为每个行生成NCOLS_INDEXSIZE goroutines,那么这是一个非常糟糕的做法。
这可以在你的基准测试中看到,你有2k列,只有1k行 - 你会得到很大的改进。但是在所有其他情况下,当列数<<行数,goroutine产卵成为瓶颈。
相反,你应该产生一个goroutine池(接近你的CPU数量),并通过渠道在它们之间分配工作 - 它是规范的方式。您可能需要阅读https://blog.golang.org/pipelines