我正在尝试加快我当前使用的python代码的速度。该代码旨在压缩社交图。该代码示例按预期在我的数据上工作,主要的问题是花费的时间。如果不进行任何并行化,则在Slashdot0902(https://snap.stanford.edu/data/soc-Slashdot0902.html)上的代码大约需要4000 s,在soc-epinions1(https://snap.stanford.edu/data/soc-Epinions1.html)上的代码需要3800s才能产生比预期高得多的压缩表示。
我分析了代码,并且大部分时间都花在了主压缩主体上。我正在寻找使代码并行化的方法,即实质上同时运行每个元素的压缩算法(不同元素之间的压缩完全无关)。
尝试使用多进程设置实现此加速将无法实现预期的速度提升。我尝试不成功使用numba实现并行化。我知道我试图传递矩阵(二维数组作为输入)不是应该的方式,但是我被困在这一点上。
这是代码示例。 (此处的完整代码https://pastebin.com/8yihNs2Y)
# Import everything
#Load adjacency matrix into into a variable called matrix
# Creates a list to store the final output values before writing into a file
list1 = []
# Finds the parent of current node
@cuda.jit(device=True)
def parent(index):
return int((index-1) / 2)
# Finds the sibling of current node left or right sibling based on index
@cuda.jit(device=True)
def sibling(index):
if(index % 2 == 1):
return index+1
else:
return index-1
len_row = matrix.shape[0]
# Find the height of the binary tree and use it to find n i.e the no.of elements in the array
height = int(math.log2(len_row)) + 1
n = (2 ** height) - 1
start_index = n-len_row
temp_array = np.full(n, -1, dtype=np.int8)
@vectorize(['int32(int32,int32,int32,int32,int32)'], target='cuda')
def compress(input_array, temp_array, n, start_index, len_row):
for i in range(len_row):
if(input_array[i] == 1):
current_index = start_index+i
dcn_reached = False
while dcn_reached == False:
temp_array[current_index] = 1
if(temp_array[parent(current_index)] != 1):
temp_array[parent(current_index)] = 1
temp_array[sibling(current_index)] = 0
current_index = parent(current_index)
else:
dcn_reached = True
return temp_array
list1.append(compress(matrix, temp_array, n, start_index, len_row))
如果它可以正常工作,我希望有一个带有temp_array值的列表。然后,我将不得不操纵temp_array以获得最终数组,因为np.where(temp_array != -1)
在gpu中不起作用。
这是非并行化代码(此处为完整代码https://pastebin.com/BNvVUzV3)的主要部分(请注意,为了使代码正常工作,您可能需要根据以下内容更改sep=' '
中的pd.read_csv
文件)
# Load edgelist into a 2d list called matrix
list1 = []
# Two functions parent and sibling to return the index of parent and sibling node in a binary tree
start = time.time()
len_row = matrix.shape[0]
# Find the height of the binary tree and use it to find n i.e the no.of elements in the array
height = int(math.log2(len_row)) + 1
n = (2 ** height) - 1
# Loop for lal the elements in row i of the matrix to generate compressed format of that row
for i in range(len_row):
print("Element "+str(i))
# Input_array = i'th row of matrix array
input_array = matrix[i]
# Initilaize temp array with -1 values
temp_array = np.full(n, -1, dtype=np.int8)
start_index = n-len(input_array)
for i in range(len_row):
if(input_array[i] == 1):
current_index = start_index+i
dcn_reached = False
while dcn_reached == False:
temp_array[current_index] = 1
if(temp_array[parent(current_index)] != 1):
temp_array[parent(current_index)] = 1
temp_array[sibling(current_index)] = 0
current_index = parent(current_index)
else:
dcn_reached = True
i = np.where(temp_array != -1)
output = temp_array[i]
list1.append(output)
# Pass this list through bz2 compression and pickle dump it to a bin file
我想实现以下至少一项
- 使numba代码正常工作
- 建议采用其他方法来并行化上述代码
- 进一步优化的建议