我正在尝试通过至少分布在两个不同的核心上来加速一个减慢主线程速度的进程。
我认为我可以解决这个问题的原因是每个单独的操作都是独立的,只需要两个点和一个浮点数。
然而,我的第一次尝试是在执行queue.asnc
vs queue.sync
时代码运行速度明显变慢,我不知道为什么!
以下是同步运行的代码
var block = UnsafeMutablePointer<Datas>.allocate(capacity: 0)
var outblock = UnsafeMutablePointer<Decimal>.allocate(capacity: 0)
func initialise()
{
outblock = UnsafeMutablePointer<Decimal>.allocate(capacity: testWith * 4 * 2)
block = UnsafeMutablePointer<Datas>.allocate(capacity: particles.count)
}
func update()
{
var i = 0
for part in particles
{
part.update()
let x1 = part.data.p1.x; let y1 = part.data.p1.y
let x2 = part.data.p2.x; let y2 = part.data.p2.x;
let w = part.data.size * rectScale
let w2 = part.data.size * rectScale
let dy = y2 - y1; let dx = x2 - x1
let length = sqrt(dy * dy + dx * dx)
let calcx = (-(y2 - y1) / length)
let calcy = ((x2 - x1) / length)
let calcx1 = calcx * w
let calcy1 = calcy * w
let calcx2 = calcx * w2
let calcy2 = calcy * w2
outblock[i] = x1 + calcx1
outblock[i+1] = y1 + calcy1
outblock[i+2] = x1 - calcx1
outblock[i+3] = y1 - calcy1
outblock[i+4] = x2 + calcx2
outblock[i+5] = y2 + calcy2
outblock[i+6] = x2 - calcx2
outblock[i+7] = y2 - calcy2
i += 8
}
}
以下是我尝试在多个核心之间分配工作负载
let queue = DispatchQueue(label: "construction_worker_1", attributes: .concurrent)
let blocky = block
let oblocky = outblock
for i in 0..<particles.count
{
particles[i].update()
block[i] = particles[i].data//Copy the raw data into a thead safe format
queue.async {
let x1 = blocky[i].p1.x; let y1 = blocky[i].p1.y
let x2 = blocky[i].p2.x; let y2 = blocky[i].p2.x;
let w = blocky[i].size * rectScale
let w2 = blocky[i].size * rectScale
let dy = y2 - y1; let dx = x2 - x1
let length = sqrt(dy * dy + dx * dx)
let calcx = (-(y2 - y1) / length)
let calcy = ((x2 - x1) / length)
let calcx1 = calcx * w
let calcy1 = calcy * w
let calcx2 = calcx * w2
let calcy2 = calcy * w2
let writeIndex = i * 8
oblocky[writeIndex] = x1 + calcx1
oblocky[writeIndex+1] = y1 + calcy1
oblocky[writeIndex+2] = x1 - calcx1
oblocky[writeIndex+3] = y1 - calcy1
oblocky[writeIndex+4] = x2 + calcx2
oblocky[writeIndex+5] = y2 + calcy2
oblocky[writeIndex+6] = x2 - calcx2
oblocky[writeIndex+7] = y2 - calcy2
}
}
我真的不知道为什么会发生这种放缓!我正在使用UnsafeMutablePointer
所以数据是线程安全的,我确保不会同时由多个线程读取或写入任何变量。
这里发生了什么?
答案 0 :(得分:3)
如Performing Loop Iterations Concurrently中所述,每个块都会分配到某个后台队列。因此,您需要“跨越”数组,让每次迭代处理多个数据点,而不仅仅是一个。
此外,dispatch_apply
在Swift 3及更高版本中称为concurrentPerform
,旨在并行执行循环,并针对特定设备的核心进行了优化。结合大踏步,您应该获得一些性能优势:
DispatchQueue.global(qos: .userInitiated).async {
let stride = 100
DispatchQueue.concurrentPerform(iterations: particles.count / stride) { iteration in
let start = iteration * stride
let end = min(start + stride, particles.count)
for i in start ..< end {
particles[i].update()
block[i] = particles[i].data//Copy the raw data into a thead safe format
queue.async {
let x1 = blocky[i].p1.x; let y1 = blocky[i].p1.y
let x2 = blocky[i].p2.x; let y2 = blocky[i].p2.x
let w = blocky[i].size * rectScale
let w2 = blocky[i].size * rectScale
let dy = y2 - y1; let dx = x2 - x1
let length = hypot(dy, dx)
let calcx = -dy / length
let calcy = dx / length
let calcx1 = calcx * w
let calcy1 = calcy * w
let calcx2 = calcx * w2
let calcy2 = calcy * w2
let writeIndex = i * 8
oblocky[writeIndex] = x1 + calcx1
oblocky[writeIndex+1] = y1 + calcy1
oblocky[writeIndex+2] = x1 - calcx1
oblocky[writeIndex+3] = y1 - calcy1
oblocky[writeIndex+4] = x2 + calcx2
oblocky[writeIndex+5] = y2 + calcy2
oblocky[writeIndex+6] = x2 - calcx2
oblocky[writeIndex+7] = y2 - calcy2
}
}
}
}
您应该尝试不同的stride
值,并了解效果如何变化。
我无法运行此代码(我没有样本数据,我没有Datas
的定义等),所以如果我介绍任何问题,我会道歉。但是不要只关注这里的代码,而只关注使用concurrentPerform
执行并发循环的更广泛的问题,并大力确保你在每个线程上有足够的工作,因此线程开销不会超过了并行运行线程的更广泛的好处。
有关详细信息,请参阅https://stackoverflow.com/a/22850936/1271826,了解有关此处问题的更广泛讨论。
答案 1 :(得分:2)
您的期望可能是错误的。你的目标是释放主线程,你做到了。 那现在更快了:主线程!
但是后台线程上的async
意味着“请在任何旧的时候请这样做,允许它暂停,以便其他代码可以在其中间运行” - 这并不意味着“快速执行” , 一点也不。我在您的代码中没有看到任何qos
规范,因此您不会要求特别关注或任何事情。