正则表达式将csv转换为以管道分隔的

时间:2018-01-03 03:35:13

标签: python regex csv

我有以下字符串:

s = "XIDJIJFHD8","Gothika","a0KU000000JMYCrMAP","USA","English","Sub & Audio","VOD","SD","01/01/2011 00:00:00.000000","12/31/2049 00:00:00.000000",,"Confirmed",,,,"Feature",,"2003-11-21","2004-03-23",,"R","for violence, brief language and nudity.","2024863","6000008953",,,"10.5240/A6FC-02AE-8093-3B05-6240-T","10.5240/D052-B470-0D01-25DF-DA91-4","2024863_6000008953","idwb:2024863_6000008953","CA-0000950613"

我需要将其转换为'管道分隔的'。字段用引号"括起来,但如果字段为空,则它不会有任何内容。最终输出中|的数量应为31.这是我到目前为止所拥有的:

re.sub(r'(\,|\")(,)(,|\")', '|', s)

然而,上面的长度只有23.正确的正则表达式是什么?

或者,甚至更好,也许我可以直接在csv模块中执行此操作。类似的东西:

string_with_pipes = csv.write(s, delimiter="|")

请注意,我只想获得一个修改后的字符串,而不是实际保存文件。

2 个答案:

答案 0 :(得分:2)

no need for regular expressions。你可以使用csv.reader()csv.writer()的组合,使用我们将StringIO使用的临时缓冲区来执行此操作:

import csv
from StringIO import StringIO


s = '"XIDJIJFHD8","Gothika","a0KU000000JMYCrMAP","USA","English","Sub & Audio","VOD","SD","01/01/2011 00:00:00.000000","12/31/2049 00:00:00.000000",,"Confirmed",,,,"Feature",,"2003-11-21","2004-03-23",,"R","for violence, brief language and nudity.","2024863","6000008953",,,"10.5240/A6FC-02AE-8093-3B05-6240-T","10.5240/D052-B470-0D01-25DF-DA91-4","2024863_6000008953","idwb:2024863_6000008953","CA-0000950613"'

reader = csv.reader([s])

buffer = StringIO()
writer = csv.writer(buffer, delimiter="|")
writer.writerows(reader)

buffer.seek(0)
print(buffer.getvalue())

打印:

XIDJIJFHD8|Gothika|a0KU000000JMYCrMAP|USA|English|Sub & Audio|VOD|SD|01/01/2011 00:00:00.000000|12/31/2049 00:00:00.000000||Confirmed||||Feature||2003-11-21|2004-03-23||R|for violence, brief language and nudity.|2024863|6000008953|||10.5240/A6FC-02AE-8093-3B05-6240-T|10.5240/D052-B470-0D01-25DF-DA91-4|2024863_6000008953|idwb:2024863_6000008953|CA-0000950613

答案 1 :(得分:1)

连续逗号包含在一个匹配中。

你想要一个正则表达式,它不包括替换本身,但确保它们在那里

re.sub(r'(?<=[,"])(,)(?=[,"])', '|', s)

这使用前瞻和后视来检查,或“存在而不替换它们。”

  1. (,)匹配逗号
  2. (?<=[,"])紧接着是逗号或双引号
  3. (?=[,"])紧接着是逗号或双引号
  4. 第一组和第三组中的(?确保这些组不包含在替换