我目前正在尝试使用熊猫的read_csv函数从.csv文件提取数据。 我的.csv文件具有以下格式:
[链接到第一张图片,因为我不允许添加图片] [1]
在我看来,似乎很合理的格式只是标题行中的#有点困扰我,但不影响我面临的问题。
当我用pandas.read_csv(csv_path)
读取文件时,即使它正确分配了标题行,它也会将所有行放入生成的数据帧的第一列,并为所有其他列分配NAN值。
在我看来,它似乎没有检测到分隔符,因此将整行视为生成以下数据帧的一个大条目:
[链接到第二张图片,因为我不允许添加图片] [2]
我当前的代码:
import pandas
csv_path = 'sample.csv'
data_frame = pandas.read_csv(csv_path)
答案 0 :(得分:0)
它正在正常工作:
println(
a.split(":").map { it.toDouble() }.reduce { a, b -> a / b }
) // 1.25
退出:
import pandas as pd
from io import StringIO
print(pd.__version__)
s = '''#filename,file_size,file_attributes,region_count,region_id,region_shape_attributes,region_attributes
video_0029-frame_00000.jpeg,1092976,"{}",22,0,"{""name"":""rect"",""x"":68,""y"":283,""width"":58,""height"":20}","{""class"":""Car""}"
video_0029-frame_00000.jpeg,1092976,"{}",22,1,"{""name"":""rect"",""x"":676,""y"":297,""width"":52,""height"":19}","{""class"":""Car""}"
video_0029-frame_00000.jpeg,1092976,"{}",22,2,"{""name"":""rect"",""x"":708,""y"":254,""width"":55,""height"":20}","{""class"":""Car""}"'''
pd.read_csv(StringIO(s))
答案 1 :(得分:0)
您还可以尝试在Pandas.read_csv()中传递quotechar
和sep
参数:
data_frame = pd.read_csv(csv_path, sep=',', quotechar ='"')
运行该命令,在调用data.head()
时得到以下输出:
filename file_size file_attributes region_count \
0 video_0029-frame_00000.jpeg 1092976 {} 22
1 video_0029-frame_00000.jpeg 1092976 {} 22
2 video_0029-frame_00000.jpeg 1092976 {} 22
region_id region_shape_attributes \
0 0 {"name":"rect","x":68,"y":283,"width":58,"heig...
1 1 {"name":"rect","x":676,"y":297,"width":52,"hei...
2 2 {"name":"rect","x":708,"y":254,"width"
region_attributes
0 {"class":"Car"}
1 {"class":"Car"}
2 NaN