熊猫无法分隔csv文件的列

时间:2019-05-04 15:26:33

标签: python pandas csv

我目前正在尝试使用熊猫的read_csv函数从.csv文件提取数据。 我的.csv文件具有以下格式:

[链接到第一张图片,因为我不允许添加图片] [1]

在我看来,似乎很合理的格式只是标题行中的#有点困扰我,但不影响我面临的问题。

当我用pandas.read_csv(csv_path)读取文件时,即使它正确分配了标题行,它也会将所有行放入生成的数据帧的第一列,并为所有其他列分配NAN值。 在我看来,它似乎没有检测到分隔符,因此将整行视为生成以下数据帧的一个大条目: [链接到第二张图片,因为我不允许添加图片] [2]

我当前的代码:

import pandas

csv_path = 'sample.csv'
data_frame = pandas.read_csv(csv_path)

2 个答案:

答案 0 :(得分:0)

它正在正常工作:

println(
        a.split(":").map { it.toDouble() }.reduce { a, b -> a / b }
) // 1.25

退出:

import pandas as pd
from io import StringIO
print(pd.__version__)

s = '''#filename,file_size,file_attributes,region_count,region_id,region_shape_attributes,region_attributes
video_0029-frame_00000.jpeg,1092976,"{}",22,0,"{""name"":""rect"",""x"":68,""y"":283,""width"":58,""height"":20}","{""class"":""Car""}"
video_0029-frame_00000.jpeg,1092976,"{}",22,1,"{""name"":""rect"",""x"":676,""y"":297,""width"":52,""height"":19}","{""class"":""Car""}"
video_0029-frame_00000.jpeg,1092976,"{}",22,2,"{""name"":""rect"",""x"":708,""y"":254,""width"":55,""height"":20}","{""class"":""Car""}"'''

pd.read_csv(StringIO(s))

答案 1 :(得分:0)

您还可以尝试在Pandas.read_csv()中传递quotecharsep参数:

data_frame = pd.read_csv(csv_path, sep=',', quotechar ='"')

运行该命令,在调用data.head()时得到以下输出:

                      filename  file_size file_attributes  region_count  \
0  video_0029-frame_00000.jpeg    1092976              {}            22   
1  video_0029-frame_00000.jpeg    1092976              {}            22   
2  video_0029-frame_00000.jpeg    1092976              {}            22   

   region_id                            region_shape_attributes  \
0          0  {"name":"rect","x":68,"y":283,"width":58,"heig...   
1          1  {"name":"rect","x":676,"y":297,"width":52,"hei...   
2          2             {"name":"rect","x":708,"y":254,"width"   

  region_attributes  
0   {"class":"Car"}  
1   {"class":"Car"}  
2               NaN