我有一个csv文件,格式如下:
"4931286","Lotion","New York","Bright color, yellow with 5" long
20% nylon"
"931286","Shampoo","New York","Dark, yellow with 10" long
20% nylon"
"3931286","Conditioner","LA","Bright color, yellow with 5" long
50% nylon"
以上数据应读取为包含4列的3行:ID,产品名称,位置和说明。可以看出,每行的描述中都有换行符。
我一直在寻找其他相关的stackoverflow问题,但没有一个解决方案似乎可以解决这个问题。
这是我的尝试:
from StringIO import StringIO
file = StringIO("""4931286","Lotion","New York","Bright color, yellow\n with 5" long 20% nylon""")
for row in csv.reader(file,quotechar='"', delimiter=',',quoting=csv.QUOTE_ALL, skipinitialspace=True):
print row
结果如下:
['4931286"', 'Lotion', 'New York', 'Bright color, yellow with 5 long']
['20% nylon']
但是,我想要的是
['4931286"', 'Lotion', 'New York', 'Bright color, yellow with 5 long 20% nylon']
我怎么能实现这个目标?在python中应该有一种方法吗?
答案 0 :(得分:4)
数据不是CSV格式。
CSV中的{p>"
必须使用\
"Bright color, yellow\n with 5\" long 20% nylon"
进行转义。
如果"
仅用于英寸(以数字为前缀),请尝试:
import re
data = re.sub(r'([0-9])"(?![,\n])', r'\1\\"', data)
如果前缀为数字
,则此正则表达式会将所有"
替换为\"
然后使用csv.reader
修改:由于MaxU's suggestion而更改了正则表达式。
答案 1 :(得分:1)
如何迭代每两行,
import csv
from StringIO import StringIO
from itertools import izip
def pairwise(iterable):
"s -> (s0, s1), (s2, s3), (s4, s5), ..."
a = iter(iterable)
return izip(a, a)
file = StringIO(""""4931286","Lotion","New York","Bright color, yellow with 5" long
20% nylon"
"931286","Shampoo","New York","Dark, yellow with 10" long
20% nylon"
"3931286","Conditioner","LA","Bright color, yellow with 5" long
50% nylon"
""")
reader = csv.reader(file,quotechar='"', delimiter=',',quoting=csv.QUOTE_ALL, skipinitialspace=True)
for row, row2 in pairwise(reader):
row[-1] = ' '.join([row[-1], row2[0]])
print(row)
# Output
['4931286', 'Lotion', 'New York', 'Bright color, yellow with 5 long 20% nylon"']
['931286', 'Shampoo', 'New York', 'Dark, yellow with 10 long 20% nylon"']
['3931286', 'Conditioner', 'LA', 'Bright color, yellow with 5 long 50% nylon"']