尝试将pandas数据帧转换为dask数据帧时的ValueError

时间:2017-11-15 12:53:45

标签: python pandas dask

我正在尝试将pandas数据帧转换为dask数据帧。这是我的数据帧的样子,它只包含文件名和向量

    file_names  \
    0  C:\Users\pilot_project\pilot_2/...   
    1  C:\Users\pilot_project\pilot_2/...   
    2  C:\Users\pilot_project\pilot_2/...   
    3  C:\Users\pilot_project\pilot_2/...   
    4  C:\Users\Yilmaz\Desktop\pilot_project\pilot_2/... 
      vectors  
    0  [0.011174, 0.011548, 0.011642, 0.000159, 2.3e-...  
    1  [0.003017, 0.003247, 0.003309, 9e-06, 6e-06, 8...  
    2  [0.008307, 0.008461, 0.008461, 0.0, 0.0, 2.8e-...  
    3  [0.007146, 0.007241, 0.007261, 0.000392, 2.4e-...  
    4  [0.007226, 0.007281, 0.007336, 9.9e-05, 1.9e-0...  

这是简单的代码

import dask.dataframe as dd
import pandas as pd

df1 = pd.read_pickle('output.p')
df1['vectors'] = df1['vectors'].apply(lambda x: np.array(x)) # This line didn't solve my problem
df = dd.from_pandas(df1, npartitions=8)

我明白了:

ValueError: setting an array element with a sequence.

你有什么想法吗?非常感谢您提前

0 个答案:

没有答案