将延迟对象列表转换为Dask数组

时间:2019-09-27 18:57:07

标签: dask

我有一个使用Delayed时获得的dask.persist个对象的列表:

[Delayed('get_atomic_fingerprint-aca9b774-cfcc-4160-86ae-a24765df24ad')
 Delayed('get_atomic_fingerprint-c8eaf312-ff5c-4582-83b9-eb2b00e715b2')
 Delayed('get_atomic_fingerprint-839365ce-8568-44bb-9b3c-ecb017811686')
 Delayed('get_atomic_fingerprint-5fec2286-939a-43d3-81d6-8e9c6932e405')
 Delayed('get_atomic_fingerprint-c424fd13-68f8-4325-a899-eb3d0ab9a212')
 Delayed('get_atomic_fingerprint-540561e6-c8af-47a2-a92d-1465d661ba62')
 Delayed('get_atomic_fingerprint-5e2f0aa0-fe5c-4c66-87d6-d9ab584de6a6')
 Delayed('get_atomic_fingerprint-8303c6ab-6a28-4122-b9c2-ef6d3c845375')
 Delayed('get_atomic_fingerprint-a326b564-e5c3-4198-bc4f-ebe3c3a929f6')
 Delayed('get_atomic_fingerprint-d75d354a-73ab-4537-a520-11ed127c84db')
 Delayed('get_atomic_fingerprint-29653b4d-c0c1-4ddb-a198-8557d2f666d1')
 Delayed('get_atomic_fingerprint-3499f5b5-1921-46cb-8953-14afe25970d4')
 Delayed('get_atomic_fingerprint-88c6d705-6c65-4038-a6dd-3589efa0dc7f')
 Delayed('get_atomic_fingerprint-00783026-547d-4fc6-a60b-706149ab9242')
 Delayed('get_atomic_fingerprint-f470fd08-267a-4abf-8314-09bcc0b8848b')
 Delayed('get_atomic_fingerprint-e5117f9a-a17c-40dc-b56f-a54cf090dfa8')
 Delayed('get_atomic_fingerprint-cb885b3f-525f-4be4-9f03-dd6726677b27')
 Delayed('get_atomic_fingerprint-887a3558-96cc-4930-a9cd-d5d05bf7e5e8')
 Delayed('get_atomic_fingerprint-ca4b9c13-b24a-41b2-a36b-01d05f52c5ba')
 Delayed('get_atomic_fingerprint-d02bbc0c-e846-4704-b4ca-2d0b31fa9c7a')
 Delayed('get_atomic_fingerprint-6b835c18-f478-4f1a-9798-9373a3540f09')
 Delayed('get_atomic_fingerprint-cdd3262f-1abb-44dc-bcec-d59420bc96f7')
 Delayed('get_atomic_fingerprint-a0d7a43e-b06b-4634-8178-2c4cb05ee7d9')
 Delayed('get_atomic_fingerprint-9c85533e-6d91-4083-9fee-86ffcc633191')
 Delayed('get_atomic_fingerprint-7cc7bca9-78a9-4ffa-93b8-9ec830ba504e')
 Delayed('get_atomic_fingerprint-1e83637f-a8d9-4c64-a78a-ac8e03291d16')
 Delayed('get_atomic_fingerprint-8fd20c75-c6c3-4813-b9a9-46aa712e38d1')
 Delayed('get_atomic_fingerprint-53a6e24d-2a3f-4805-bb0c-be8469301a9b')
 Delayed('get_atomic_fingerprint-f13abba5-64e1-4106-a8b6-99c6b6afcbea')
 Delayed('get_atomic_fingerprint-c62c15aa-e4e2-42e4-ad33-f5d68f111493')
 Delayed('get_atomic_fingerprint-8dfba7d8-9661-47a1-9ecc-462abb760aad')
 Delayed('get_atomic_fingerprint-c0813c2b-2bac-4d86-b9d7-6d995e5a59b4')
 Delayed('get_atomic_fingerprint-110272df-f7f6-444d-8cf3-5c7a57417809')
 Delayed('get_atomic_fingerprint-db28f110-2e56-44da-aff0-436b20481bc5')
 Delayed('get_atomic_fingerprint-27b7fd56-57e2-44f4-a8b6-1509b26d058b')
 Delayed('get_atomic_fingerprint-3132ca26-572f-4c92-97e9-3ac81ae77c76')
 Delayed('get_atomic_fingerprint-667aee38-96a4-4446-88fb-de418ee5d7cc')
 Delayed('get_atomic_fingerprint-81cfccf7-106c-4215-ac11-b0d3d2dcbfe1')
 Delayed('get_atomic_fingerprint-fbd25c74-3ead-4d55-b488-6a4e0a58c33b')
 Delayed('get_atomic_fingerprint-b5d9b18e-998c-4753-8edd-ed8b23ee4635')]

这些结果是一个float列表,我想要做的是创建一个dask数组。将如何进行?我知道from_delayed的存在,但是我应该遍历列表中的所有Delayed对象并连接数组吗?我将不胜感激。

1 个答案:

答案 0 :(得分:2)

在再次阅读文档,特别是本节https://docs.dask.org/en/latest/array-creation.html#using-dask-delayed之后,我发现了如何做。

我刚刚修改了我的延迟函数,以返回np.array而不是list,并且能够完成文档中显示的内容。

我将Delayed转换为Dask Array的代码如下:

sample = stacked_features[0].compute()
dim = (len(stacked_features), len(sample))
stacked_features = [
    dask.array.from_delayed(lazy, dtype=float, shape=sample.shape)
    for lazy in stacked_features
]
stacked_features = (
    dask.array.stack(stacked_features, axis=0).reshape(dim).rechunk(dim)
)

More information can be seen in this commit