熊猫无法正确解析逗号分隔的文件

时间:2019-08-22 08:49:15

标签: python-3.x pandas csv

我正在尝试解析CSV文件,但是熊猫以某种方式无法识别分隔符/分隔符。我已经看过类似的回复,但是仍然无法正确解析我的文件(仅正确解析了标头)。

文件的每一行如下:https://drive.google.com/a/company.com/uc?export=download&id=10p-c0i2xtWBSvJ3OJV5pgEUarE1X,-1,"{""type"":""F03""}",0,0,"{}","{}"

我尝试过的代码如下:

In  [0]: import pandas as pd

In  [1]: data = pd.read_csv('file.csv', sep=',')
         data.head()
Out [1]: 

    filename          file_size   file_attributes    region_count    region_id   region_shape_attributes  region_attributes
0   https://drive...        NaN               NaN             NaN          NaN                       NaN                NaN
1   https://drive...        NaN               NaN             NaN          NaN                       NaN                NaN
2   https://drive...        NaN               NaN             NaN          NaN                       NaN                NaN
3   https://drive...        NaN               NaN             NaN          NaN                       NaN                NaN
4   https://drive...        NaN               NaN             NaN          NaN                       NaN                NaN

In  [2]: data['filename'][0]
Out [2]: 

'https://drive.google.com/a/company.com/uc?export=download&id=10p-c0i2xtWBSvJ3OJV5pgEUarE1X,-1,"{""type"":""F03""}",0,0,"{}","{}"'

1 个答案:

答案 0 :(得分:1)

对不起,我无法重现您的问题。但是,您可以通过以下代码来解析data数据框中的列。

df = data[['filename']]
cols_to_extract = [
    'filename', 'file_size', 'file_attributes', 'region_count', 
    'region_id', 'region_shape_attributes', 'region_attributes']
df[cols_to_extract] = pd.DataFrame(df['filename'].str.split(',').tolist(), columns=cols_to_extract)
df.head()

输出应如下所示:

    file_name           file_size   file_attributes       region_count  region_id   region_shape_attributes  region_attributes
0   https://drive...          -1    "{""type"":""F03""}"             0          0   "{}"                     "{}"
1   https://drive...          -1    "{""type"":""F03""}"             0          0   "{}"                     "{}"
2   https://drive...          -1    "{""type"":""F03""}"             0          0   "{}"                     "{}"
3   https://drive...          -1    "{""type"":""F03""}"             0          0   "{}"                     "{}"
4   https://drive...          -1    "{""type"":""F03""}"             0          0   "{}"                     "{}"

我希望这会有所帮助。