我正在尝试将pandas数据帧转换为dask数据帧。这是我的数据帧的样子,它只包含文件名和向量
file_names \
0 C:\Users\pilot_project\pilot_2/...
1 C:\Users\pilot_project\pilot_2/...
2 C:\Users\pilot_project\pilot_2/...
3 C:\Users\pilot_project\pilot_2/...
4 C:\Users\Yilmaz\Desktop\pilot_project\pilot_2/...
vectors
0 [0.011174, 0.011548, 0.011642, 0.000159, 2.3e-...
1 [0.003017, 0.003247, 0.003309, 9e-06, 6e-06, 8...
2 [0.008307, 0.008461, 0.008461, 0.0, 0.0, 2.8e-...
3 [0.007146, 0.007241, 0.007261, 0.000392, 2.4e-...
4 [0.007226, 0.007281, 0.007336, 9.9e-05, 1.9e-0...
这是简单的代码
import dask.dataframe as dd
import pandas as pd
df1 = pd.read_pickle('output.p')
df1['vectors'] = df1['vectors'].apply(lambda x: np.array(x)) # This line didn't solve my problem
df = dd.from_pandas(df1, npartitions=8)
我明白了:
ValueError: setting an array element with a sequence.
你有什么想法吗?非常感谢您提前