使用Cudf读取特定CSV文件的csv失败

时间:2019-07-20 09:55:14

标签: python csv rapids cudf

我正在尝试使用急流中的cudf处理特定的csv文件。可以通过以下链接获取文件: http://open-data-assurance-maladie.ameli.fr/depenses/download.php?Dir_Rep=Open_DAMIR&Annee=2018 我已经尝试过文件 A2018_01.csv (输入“données”,然后按验证下载) 据我了解,cudf API的使用就像熊猫一样,因此我尝试先用熊猫阅读csv:

import os
import pandas as pd
PATH='data/'
df = pd.read_csv(f'{PATH}A2018_01.csv', sep=";")

这在我的机器上大约需要2分钟。

df.describe()

FLX_ANN_MOI     ORG_CLE_REG     AGE_BEN_SNDS    BEN_RES_REG     BEN_CMU_TOP     BEN_QLT_COD     BEN_SEX_COD     DDP_SPE_COD     ETE_CAT_SNDS    ETE_REG_COD     ...     PSE_ACT_CAT     PSE_SPE_SNDS    PSE_STJ_SNDS    PRE_INS_REG     PSP_ACT_SNDS    PSP_ACT_CAT     PSP_SPE_SNDS    PSP_STJ_SNDS    TOP_PS5_TRG     Unnamed: 55
count   34003028.0  3.400303e+07    3.400303e+07    3.400303e+07    3.400303e+07    3.400303e+07    3.400303e+07    3.400303e+07    3.400303e+07    3.400303e+07    ...     3.400303e+07    3.400303e+07    3.400303e+07    3.400303e+07    3.400303e+07    3.400303e+07    3.400303e+07    3.400303e+07    3.400303e+07    0.0
mean    201801.0    5.006560e+01    4.662571e+01    5.283103e+01    3.734231e+00    1.277093e+00    1.561921e+00    6.041314e+01    8.242066e+03    8.909063e+01    ...     3.681443e+00    6.550268e+00    2.915378e+00    7.256596e+01    1.230369e+00    9.164879e+00    1.814533e+01    3.557239e+00    4.214655e+00    NaN
std     0.0     3.207707e+01    2.420884e+01    2.963844e+01    4.360434e+00    6.316855e-01    4.961724e-01    6.001130e+01    3.497950e+03    2.353433e+01    ...     1.166599e+01    2.003136e+01    3.269808e+00    3.205104e+01    7.872794e+00    2.747816e+01    3.260440e+01    3.433198e+00    3.958894e+00    NaN
min     201801.0    5.000000e+00    0.000000e+00    5.000000e+00    0.000000e+00    0.000000e+00    0.000000e+00    0.000000e+00    1.101000e+03    5.000000e+00    ...     0.000000e+00    0.000000e+00    1.000000e+00    5.000000e+00    0.000000e+00    0.000000e+00    0.000000e+00    1.000000e+00    0.000000e+00    NaN
25%     201801.0    2.400000e+01    3.000000e+01    2.700000e+01    0.000000e+00    1.000000e+00    1.000000e+00    0.000000e+00    9.999000e+03    9.900000e+01    ...     1.000000e+00    0.000000e+00    1.000000e+00    4.400000e+01    0.000000e+00    0.000000e+00    1.000000e+00    1.000000e+00    1.000000e+00    NaN
50%     201801.0    4.400000e+01    5.000000e+01    5.200000e+01    0.000000e+00    1.000000e+00    2.000000e+00    4.300000e+01    9.999000e+03    9.900000e+01    ...     2.000000e+00    0.000000e+00    1.000000e+00    9.300000e+01    0.000000e+00    1.000000e+00    1.000000e+00    2.000000e+00    1.000000e+00    NaN
75%     201801.0    7.600000e+01    7.000000e+01    7.600000e+01    9.000000e+00    1.000000e+00    2.000000e+00    1.210000e+02    9.999000e+03    9.900000e+01    ...     3.000000e+00    1.000000e+00    2.000000e+00    9.900000e+01    0.000000e+00    1.000000e+00    1.400000e+01    9.000000e+00    9.000000e+00    NaN
max     201801.0    9.900000e+01    9.900000e+01    9.900000e+01    9.000000e+00    9.000000e+00    2.000000e+00    1.210000e+02    9.999000e+03    9.900000e+01    ...     9.900000e+01    9.900000e+01    9.000000e+00    9.900000e+01    9.900000e+01    9.900000e+01    9.900000e+01    9.000000e+00    9.000000e+00    NaN

然后我尝试了cudf:

import cudf; print('cuDF Version'+ cudf.__version__)
gdf = cudf.read_csv(f'{PATH}A2018_01.csv', sep=";")

cuDF版本0.8.0 + 0.g8fa7bd3.dirty

它开始加载,但并没有停止,只是在我的jupyter笔记本电脑单元旁边显示了星号(*)。 有什么想法我应该做些什么才能使其起作用? 顺便说一句,我正在使用Ubuntu 18.04.2 LTS和'GeForce RTX 2080 Ti'作为GPU,到目前为止似乎可以正常使用,例如pytorch没问题。

0 个答案:

没有答案