在python中读取CSV时双引号中的换行符

时间:2016-06-13 15:58:04

标签: python csv pandas

我有一个csv文件,格式如下:

"4931286","Lotion","New York","Bright color, yellow with 5" long
20% nylon"
"931286","Shampoo","New York","Dark, yellow with 10" long
20% nylon"
"3931286","Conditioner","LA","Bright color, yellow with 5" long
50% nylon"

以上数据应读取为包含4列的3行:ID,产品名称,位置和说明。可以看出,每行的描述中都有换行符。

我一直在寻找其他相关的stackoverflow问题,但没有一个解决方案似乎可以解决这个问题。

这是我的尝试:

from StringIO import StringIO
file = StringIO("""4931286","Lotion","New York","Bright color, yellow\n   with 5" long 20% nylon""")

for row in csv.reader(file,quotechar='"', delimiter=',',quoting=csv.QUOTE_ALL, skipinitialspace=True):
 print row

结果如下:

['4931286"', 'Lotion', 'New York', 'Bright color, yellow with 5 long']
   ['20% nylon']

但是,我想要的是

['4931286"', 'Lotion', 'New York', 'Bright color, yellow with 5 long 20% nylon']

我怎么能实现这个目标?在python中应该有一种方法吗?

2 个答案:

答案 0 :(得分:4)

数据不是CSV格式。

CSV中的{p> "必须使用\ "Bright color, yellow\n with 5\" long 20% nylon"进行转义。

如果"仅用于英寸(以数字为前缀),请尝试:

import re
data = re.sub(r'([0-9])"(?![,\n])', r'\1\\"', data)

如果前缀为数字

,则此正则表达式会将所有"替换为\"

然后使用csv.reader

解析数据

修改:由于MaxU's suggestion而更改了正则表达式。

答案 1 :(得分:1)

如何迭代每两行,

import csv
from StringIO import StringIO
from itertools import izip

def pairwise(iterable):
    "s -> (s0, s1), (s2, s3), (s4, s5), ..."
    a = iter(iterable)
    return izip(a, a)


file = StringIO(""""4931286","Lotion","New York","Bright color, yellow with 5" long
20% nylon"
"931286","Shampoo","New York","Dark, yellow with 10" long
20% nylon"
"3931286","Conditioner","LA","Bright color, yellow with 5" long
50% nylon"
""")

reader = csv.reader(file,quotechar='"', delimiter=',',quoting=csv.QUOTE_ALL, skipinitialspace=True)
for row, row2 in pairwise(reader):
    row[-1] = ' '.join([row[-1], row2[0]])
    print(row)

# Output
['4931286', 'Lotion', 'New York', 'Bright color, yellow with 5 long 20% nylon"']
['931286', 'Shampoo', 'New York', 'Dark, yellow with 10 long 20% nylon"']
['3931286', 'Conditioner', 'LA', 'Bright color, yellow with 5 long 50% nylon"']