我在尝试读取csv文件时遇到问题并使用pandas使用python 2.7正确解析它。
某些失败的行:
Europa,2018-04-20,26948,15,Destino, - ,CRU-159617-JUN-2018,Origen,Productos incluidos,https://s3.amazonaws.com/cruceros-host/home/host-Cruceros.jpg,Crucero,06,https://www.host.com/cruceros/listado?regionId=7&startDate=2018-06-01&endDate=2018-07-01&adults=2&children=0&childrenAges=,23433,“Espana,Francia,Italia,Malta”
Australasia,2018-05-01,39155,15,Destino, - ,CRU-180907-JAN-2019,Origen,Productos incluidos,https://s3.amazonaws.com/cruceros-host/home/host-Cruceros.jpg,Crucero,01,“https://www.host.com/cruceros/listado?regionId=14&startDate=2019-01-01&endDate=2019-02-01&adults=2&children=0&childrenAges=&startPort=Sydney,Australia”,34048, “Nueva Zelanda”
代码:
frame = pd.read_csv(filepath_or_buffer=raw_file)
- 编辑:
我遇到的问题是我将返回一个带有孔行的列。
- EDIT2:
问题是excel偶然编辑了一些行并添加了“在某些行的末尾。现在它正常工作。
答案 0 :(得分:0)
import pandas as pd
df = pd.DataFrame({'a':[r'Europa,2018-04-20,26948,15,Destino,-,CRU-159617-JUN-2018,Origen,Productos incluidos,https://s3.amazonaws.com/cruceros-host/home/host-Cruceros.jpg,Crucero,06,https://www.host.com/cruceros/listado?regionId=7&startDate=2018-06-01&endDate=2018-07-01&adults=2&children=0&childrenAges=,23433,"Espana, Francia, Italia, Malta"',
r'Australasia,2018-05-01,39155,15,Destino,-,CRU-180907-JAN-2019,Origen,Productos incluidos,https://s3.amazonaws.com/cruceros-host/home/host-Cruceros.jpg,Crucero,01,"https://www.host.com/cruceros/listado?regionId=14&startDate=2019-01-01&endDate=2019-02-01&adults=2&children=0&childrenAges=&startPort=Sydney, Australia",34048,"Nueva Zelanda "']})
df.a = df.a.str.split(',')
for i in range(9):
df['Col {0}'.format(i)]=df.a.apply(lambda x: x[i])
df['Col 10'] = df.a.apply(lambda x: ','.join(x[9:]))
输出:
答案 1 :(得分:0)
您似乎没有使用分隔符。 试试这个:
pd.read_csv(filepath_or_buffer=raw_file, sep=r',')