我试图使用Python解析我的csv文件。每行有四个元素用逗号分隔。 Eeach元素是一个字符串,但它也可能包含逗号。如果元素包含逗号,则该元素是双引号。以下示例显示了带引号和不带引号的两种不同情况:
http://data.europa.eu/esco/skill/CTC_43028,"use data extraction, transformation and loading tools","ETL|extract, transform, load","<div>Integrate information from multiple applications, created and maintained by various organisations, into one consistent and transparent data structure.</div>"
http://data.europa.eu/esco/skill/SCG.TS.1.4.m.2,support company plan,follow industry guidelines|follow organisation's vision|monitor policy implementation|support company mission,<div>Act within one's work role to advance the goals and vision of the organisation.</div>
我想要的是将每一行分成四个元素。 我尝试过使用Python的split函数,但没有成功。我想我必须使用正则表达式,但我不熟悉它。 你能帮忙吗? 非常感谢。
答案 0 :(得分:2)
csv
模块就是您想要的:
import csv
with open('file.csv') as f:
r = csv.reader(f)
for row in r:
print row
['http...', 'transformation ...', 'ETL|ext ...', '<div>Integrate ...']
['http:...', 'support ...', 'follow ...', '<div>Act ...']
','
是默认分隔符,'"'
是默认的quotechar。