如何将某些DistributedMatrix
转换回Numpy数组或Scipy稀疏数组?
显然,这不是我要在大数组上执行的操作,但是在实际对大数据运行之前,这有助于调试和测试代码。
答案 0 :(得分:0)
这是从s = requests.session()
page = s.get('https://samozapis-spb.ru/moskovskiy-rayon/ctomatologicheskaya-poliklinika-no12')
soup = BeautifulSoup(page.text, 'html.parser')
# get "data-lid" from the page
spec = soup.find("div", id="spec")
# do ajax request
data = {"lid": spec["data-lid"]}
headers = {"x-requested-with" : "XMLHttpRequest"}
ajax = s.post('https://samozapis-spb.ru/_api_v3/spec.php', data=data, headers=headers).json()
spec = soup.find("div", id="spec")
soup = BeautifulSoup(ajax['html'], 'html.parser')
doctors = soup.select("a[class='ax list-group-item']")[2:]
print(doctors)
到Scipy稀疏矩阵的天真的转换:
IndexedRowMatrix
和from scipy.sparse import lil_matrix
def indexedrowmatrix_to_array(x):
output = lil_matrix((x.numRows(), x.numCols())
for indexed_row in x.rows.collect():
output[indexed_row.index] = indexed_row.vector
return output
:
CoordinateMatrix
您可以通过遍历from scipy.sparse import coo_matrix
def coordinatematrix_to_array(x):
output = coo_matrix((x.numRows(), x.numCols())
for matrix_entry in x.entries.collect():
output[matrix_entry.i, matrix_entry.j] = matrix_entry.value
return output
属性并使用BlockMatrix
和blocks
属性来分块分配,从而为rowsPerBlock
做类似的事情。