我需要从pandas.DataFrame
文件创建csv
。为此我使用方法pandas.csv_reader(...)
。此文件的问题是一个或多个列在值中包含逗号(我不控制文件格式)。
我一直试图从这个question实现解决方案但是我得到以下错误:
pandas.errors.EmptyDataError: No columns to parse from file
在实施此解决方案后出于某种原因,我尝试修复的csv文件是空白的。
以下是我正在使用的代码:
# fix csv file
with open ("/Users/username/works/test.csv",'rb') as f,\
open("/Users/username/works/test.csv",'wb') as g:
writer = csv.writer(g, delimiter=',')
for line in f:
row = line.split(',', 4)
writer.writerow(row)
# Manipulate csv file
data = pd.read_csv(os.path.expanduser\
("/Users/username/works/test.csv"),error_bad_lines=False)
有什么想法吗?
数据概述:
Id0 Id 1 Id 2 Country Company Title Email
23 123 456 AR name cargador email@email.com
24 123 456 AR name Executive assistant email@email.com
25 123 456 AR name Asistente Administrativo email@email.com
26 123 456 AR name Atención al cliente vía telefónica vía online email@email.com
39 123 456 AR name Asesor de ventas email@email.com
40 123 456 AR name inc. International company representative email@email.com
41 123 456 AR name Vendedor de campo email@email.com
42 123 456 AR name PUBLICIDAD ATENCIÓN AL CLIENTE email@email.com
43 123 456 AR name Asistente de Marketing email@email.com
44 123 456 AR name SOLDADOR email@email.com
217 123 456 AR name Se requiere vendedores Loja Quevedo Guayas) email@email.com
218 123 456 AR name Ing. Civil recién graduado Yaruquí email@email.com
219 123 456 AR name ayudantes enfermeria email@email.com
220 123 456 AR name Trip Leader for International Youth Exchange email@email.com
221 123 456 AR name COUNTRY MANAGER / DIRECTOR COMERCIAL email@email.com
250 123 456 AR name Ayudante de Pasteleria email@email.com Asesor email@email.com email@email.com
预先解析的CSV:
#,Id 1,Id 2,Country,Company,Title,Email,,,,
23,123,456,AR,name,cargador,email@email.com,,,,
24,123,456,AR,name,Executive assistant,email@email.com,,,,
25,123,456,AR,name,Asistente Administrativo,email@email.com,,,,
26,123,456,AR,name,Atención al cliente vía telefónica , vía online,email@email.com,,,
39,123,456,AR,name,Asesor de ventas,email@email.com,,,,
40,123,456,AR,name, inc.,International company representative,email@email.com,,,
41,123,456,AR,name,Vendedor de campo,email@email.com,,,,
42,123,456,AR,name,PUBLICIDAD, ATENCIÓN AL CLIENTE,email@email.com,,,
43,123,456,AR,name,Asistente de Marketing,email@email.com,,,,
44,123,456,AR,name,SOLDADOR,email@email.com,,,,
217,123,456,AR,name,Se requiere vendedores,, Loja , Quevedo, Guayas),email@email.com
218,123,456,AR,name,Ing. Civil recién graduado, Yaruquí,email@email.com,,,
219,123,456,AR,name,ayudantes enfermeria,email@email.com,,,,
220,123,456,AR,name,Trip Leader for International Youth Exchange,email@email.com,,,,
221,123,456,AR,name,COUNTRY MANAGER / DIRECTOR COMERCIAL,email@email.com,,,,
250,123,456,AR,name,Ayudante de Pasteleria,email@email.com, Asesor,email@email.com,email@email.com,
251,123,456,AR,name,Ejecutiva de Ventas,email@email.com,,,,
答案 0 :(得分:2)
如果你可以假设对于Comapny,任何逗号都后跟空格,并且所有剩余的错误逗号都在电子邮件地址之前的列中,那么可以编写一个小的解析器来处理它。 / p>
<强>代码:强>
import csv
import re
VALID_EMAIL = re.compile(r'[^@]+@[^@]+\.[^@]+')
def read_my_csv(file_handle):
# build csv reader
reader = csv.reader(file_handle)
# get the header, and find the e-mail and title columns
header = next(reader)
email_column = header.index('Email')
title_column = header.index('Title')
# yield the header up to the e-mail column
yield header[:email_column+1]
# for each row, go through rebuild columns
for row in reader:
# for each row, put the Company column back together
while row[title_column].startswith(' '):
row[title_column-1] += ',' + row[title_column]
del row[title_column]
# for each row, put the Title column back together
while not VALID_EMAIL.match(row[email_column]):
row[email_column-1] += ',' + row[email_column]
del row[email_column]
yield row[:email_column+1]
测试代码:
with open ("test.csv", 'rU') as f:
generator = read_my_csv(f)
columns = next(generator)
df = pd.DataFrame(generator, columns=columns)
print(df)
<强>结果:强>
# Id 1 Id 2 Country Company \
0 23 123 456 AR name
1 24 123 456 AR name
2 25 123 456 AR name
3 26 123 456 AR name
4 39 123 456 AR name
5 40 123 456 AR name, inc.
6 41 123 456 AR name
7 42 123 456 AR name
8 43 123 456 AR name
9 44 123 456 AR name
10 217 123 456 AR name
11 218 123 456 AR name
12 219 123 456 AR name
13 220 123 456 AR name
14 221 123 456 AR name
15 250 123 456 AR name
16 251 123 456 AR name
Title Email
0 cargador email@email.com
1 Executive assistant email@email.com
2 Asistente Administrativo email@email.com
3 Atención al cliente vía telefónica , vía online email@email.com
4 Asesor de ventas email@email.com
5 International company representative email@email.com
6 Vendedor de campo email@email.com
7 PUBLICIDAD, ATENCIÓN AL CLIENTE email@email.com
8 Asistente de Marketing email@email.com
9 SOLDADOR email@email.com
10 Se requiere vendedores,, Loja , Quevedo, Guayas) email@email.com
11 Ing. Civil recién graduado, Yaruquí email@email.com
12 ayudantes enfermeria email@email.com
13 Trip Leader for International Youth Exchange email@email.com
14 COUNTRY MANAGER / DIRECTOR COMERCIAL email@email.com
15 Ayudante de Pasteleria email@email.com
16 Ejecutiva de Ventas email@email.com