我试图在带有输出5个浮点数的函数的应用上使用dask。我将在此处的示例中进行简化。
def func1(row, param):
return float(row.Val1) * param, float(row.Val1) * np.power(param, 2)
data = pd.DataFrame(np.array([["A01", 12], ["A02", 24], ["A03", 13]]), columns=["ID", "Val1"])
data2 = dd.from_pandas(data, npartitions=2).map_partitions(lambda df: df.apply(lambda row: func1(row, 2), axis=1, result_type="expand"), meta=pd.DataFrame()).compute(scheduler=get)
如果我不放置meta,则会收到以下错误消息:
ValueError: Metadata inference failed in `lambda`.
You have supplied a custom function and Dask is unable to
determine the type of output that that function returns.
To resolve this please provide a meta= keyword.
The docstring of the Dask function you ran should have more information.
Original error is below:
------------------------
ValueError("could not convert string to float: 'foo'", 'occurred at index 0')
如果我放置一个meta(虽然可能不合适),我会得到一个:
ValueError: The columns in the computed data do not match the columns in the provided metadata
任何人都可以帮忙吗? :)
答案 0 :(得分:1)
您提供的空DataFrame没有正确的列名。您没有在元数据中提供任何列,但是您的输出中确实包含它们。这是错误的根源。
元值应与预期输出的列名和dtype相匹配。