Pandas read_csv不会正确解析行

时间:2018-05-09 17:03:26

标签: python pandas csv

我在尝试读取csv文件时遇到问题并使用pandas使用python 2.7正确解析它。

某些失败的行:

  

Europa,2018-04-20,26948,15,Destino, - ,CRU-159617-JUN-2018,Origen,Productos incluidos,https://s3.amazonaws.com/cruceros-host/home/host-Cruceros.jpg,Crucero,06,https://www.host.com/cruceros/listado?regionId=7&startDate=2018-06-01&endDate=2018-07-01&adults=2&children=0&childrenAges=,23433,“Espana,Francia,Italia,Malta”

     

Australasia,2018-05-01,39155,15,Destino, - ,CRU-180907-JAN-2019,Origen,Productos incluidos,https://s3.amazonaws.com/cruceros-host/home/host-Cruceros.jpg,Crucero,01,“https://www.host.com/cruceros/listado?regionId=14&startDate=2019-01-01&endDate=2019-02-01&adults=2&children=0&childrenAges=&startPort=Sydney,Australia”,34048, “Nueva Zelanda”

代码:

frame = pd.read_csv(filepath_or_buffer=raw_file)

- 编辑:

我遇到的问题是我将返回一个带有孔行的列。

- EDIT2:

问题是excel偶然编辑了一些行并添加了“在某些行的末尾。现在它正常工作。

2 个答案:

答案 0 :(得分:0)

pandas.Series.str并应用函数

import pandas as pd

df = pd.DataFrame({'a':[r'Europa,2018-04-20,26948,15,Destino,-,CRU-159617-JUN-2018,Origen,Productos incluidos,https://s3.amazonaws.com/cruceros-host/home/host-Cruceros.jpg,Crucero,06,https://www.host.com/cruceros/listado?regionId=7&startDate=2018-06-01&endDate=2018-07-01&adults=2&children=0&childrenAges=,23433,"Espana, Francia, Italia, Malta"',
                        r'Australasia,2018-05-01,39155,15,Destino,-,CRU-180907-JAN-2019,Origen,Productos incluidos,https://s3.amazonaws.com/cruceros-host/home/host-Cruceros.jpg,Crucero,01,"https://www.host.com/cruceros/listado?regionId=14&startDate=2019-01-01&endDate=2019-02-01&adults=2&children=0&childrenAges=&startPort=Sydney, Australia",34048,"Nueva Zelanda "']})

df.a = df.a.str.split(',')
for i in range(9):
    df['Col {0}'.format(i)]=df.a.apply(lambda x: x[i])

df['Col 10'] = df.a.apply(lambda x: ','.join(x[9:]))

输出:

Output

答案 1 :(得分:0)

您似乎没有使用分隔符。 试试这个:

pd.read_csv(filepath_or_buffer=raw_file, sep=r',')