结论
向read_csv添加参数效果很好。
...但是双引号仍然不见了。
变得有用的页面
https://www.kaggle.com/szelee/how-to-import-a-csv-file-of-55-million-rows#369081
我通过Dask读取CSV文件,却没有做任何事情通过dask写入CSV文件。
但是Dask会更改csv文件的内容。
import os
import dask.dataframe as dd
user_name = os.environ['USERPROFILE'].replace('\\', '/')
dir = user_name + '/Desktop/'
types_dict = {
'Region': 'object',
'Product': 'object',
'Date': 'object',
'Sales': 'object'
}
#I changed to the following code. df = dd.read_csv(dir + 'Sales_Data_1.csv')
df = dd.read_csv(dir + 'Sales_Data_1.csv', dtype=types_dict)
# In case of no dtype parameter on read_csv()
# print(df.dtypes)
# Region object
# Product object
# Date object
# Sales float64
# dtype: object
# Error Occurs
# FileNotFoundError: [Errno 2] No such file or directory: 'C:\\Users\\username\\Desktop\\Sales_Data_1.csv\\1.part'
#df.to_csv(dir + 'Sales_Data_1.csv')
df.compute().to_csv(dir + 'Sales_Data_1_dask.csv', index=False, quotechar = '"', doublequote = True)
我从此站点下载了CSV文件。
https://www.masterdataanalysis.com/ms-excel/analyzing-50-million-records-excel/
区分Sales_Data_1.csv和Sales_Data_1_dask.csv
diff by Winmerge
答案 0 :(得分:0)
# FileNotFoundError: [Errno 2] No such file or directory: 'C:\\Users\\username\\Desktop\\Sales_Data_1.csv\\1.part'
#df.to_csv(dir + 'Sales_Data_1.csv')
df.compute().to_csv(dir + 'Sales_Data_1_dask.csv', index=False, quotechar = '"', doublequote = True)
Dask数据框不会写入单个文件(这很难并行执行)。取而代之的是为它提供一个目录,它将很多文件写入该目录。我鼓励您阅读该函数的文档字符串。