我正在尝试解析CSV文件,但是熊猫以某种方式无法识别分隔符/分隔符。我已经看过类似的回复,但是仍然无法正确解析我的文件(仅正确解析了标头)。
文件的每一行如下:https://drive.google.com/a/company.com/uc?export=download&id=10p-c0i2xtWBSvJ3OJV5pgEUarE1X,-1,"{""type"":""F03""}",0,0,"{}","{}"
我尝试过的代码如下:
In [0]: import pandas as pd
In [1]: data = pd.read_csv('file.csv', sep=',')
data.head()
Out [1]:
filename file_size file_attributes region_count region_id region_shape_attributes region_attributes
0 https://drive... NaN NaN NaN NaN NaN NaN
1 https://drive... NaN NaN NaN NaN NaN NaN
2 https://drive... NaN NaN NaN NaN NaN NaN
3 https://drive... NaN NaN NaN NaN NaN NaN
4 https://drive... NaN NaN NaN NaN NaN NaN
In [2]: data['filename'][0]
Out [2]:
'https://drive.google.com/a/company.com/uc?export=download&id=10p-c0i2xtWBSvJ3OJV5pgEUarE1X,-1,"{""type"":""F03""}",0,0,"{}","{}"'
答案 0 :(得分:1)
对不起,我无法重现您的问题。但是,您可以通过以下代码来解析data
数据框中的列。
df = data[['filename']]
cols_to_extract = [
'filename', 'file_size', 'file_attributes', 'region_count',
'region_id', 'region_shape_attributes', 'region_attributes']
df[cols_to_extract] = pd.DataFrame(df['filename'].str.split(',').tolist(), columns=cols_to_extract)
df.head()
输出应如下所示:
file_name file_size file_attributes region_count region_id region_shape_attributes region_attributes
0 https://drive... -1 "{""type"":""F03""}" 0 0 "{}" "{}"
1 https://drive... -1 "{""type"":""F03""}" 0 0 "{}" "{}"
2 https://drive... -1 "{""type"":""F03""}" 0 0 "{}" "{}"
3 https://drive... -1 "{""type"":""F03""}" 0 0 "{}" "{}"
4 https://drive... -1 "{""type"":""F03""}" 0 0 "{}" "{}"
我希望这会有所帮助。