我有以下字符串:
s = "XIDJIJFHD8","Gothika","a0KU000000JMYCrMAP","USA","English","Sub & Audio","VOD","SD","01/01/2011 00:00:00.000000","12/31/2049 00:00:00.000000",,"Confirmed",,,,"Feature",,"2003-11-21","2004-03-23",,"R","for violence, brief language and nudity.","2024863","6000008953",,,"10.5240/A6FC-02AE-8093-3B05-6240-T","10.5240/D052-B470-0D01-25DF-DA91-4","2024863_6000008953","idwb:2024863_6000008953","CA-0000950613"
我需要将其转换为'管道分隔的'。字段用引号"
括起来,但如果字段为空,则它不会有任何内容。最终输出中|
的数量应为31.这是我到目前为止所拥有的:
re.sub(r'(\,|\")(,)(,|\")', '|', s)
然而,上面的长度只有23.正确的正则表达式是什么?
或者,甚至更好,也许我可以直接在csv模块中执行此操作。类似的东西:
string_with_pipes = csv.write(s, delimiter="|")
请注意,我只想获得一个修改后的字符串,而不是实际保存文件。
答案 0 :(得分:2)
有no need for regular expressions。你可以使用csv.reader()
和csv.writer()
的组合,使用我们将StringIO
使用的临时缓冲区来执行此操作:
import csv
from StringIO import StringIO
s = '"XIDJIJFHD8","Gothika","a0KU000000JMYCrMAP","USA","English","Sub & Audio","VOD","SD","01/01/2011 00:00:00.000000","12/31/2049 00:00:00.000000",,"Confirmed",,,,"Feature",,"2003-11-21","2004-03-23",,"R","for violence, brief language and nudity.","2024863","6000008953",,,"10.5240/A6FC-02AE-8093-3B05-6240-T","10.5240/D052-B470-0D01-25DF-DA91-4","2024863_6000008953","idwb:2024863_6000008953","CA-0000950613"'
reader = csv.reader([s])
buffer = StringIO()
writer = csv.writer(buffer, delimiter="|")
writer.writerows(reader)
buffer.seek(0)
print(buffer.getvalue())
打印:
XIDJIJFHD8|Gothika|a0KU000000JMYCrMAP|USA|English|Sub & Audio|VOD|SD|01/01/2011 00:00:00.000000|12/31/2049 00:00:00.000000||Confirmed||||Feature||2003-11-21|2004-03-23||R|for violence, brief language and nudity.|2024863|6000008953|||10.5240/A6FC-02AE-8093-3B05-6240-T|10.5240/D052-B470-0D01-25DF-DA91-4|2024863_6000008953|idwb:2024863_6000008953|CA-0000950613
答案 1 :(得分:1)
连续逗号包含在一个匹配中。
你想要一个正则表达式,它不包括替换本身,但确保它们在那里
re.sub(r'(?<=[,"])(,)(?=[,"])', '|', s)
这使用前瞻和后视来检查,或“存在而不替换它们。”
(,)
匹配逗号(?<=[,"])
紧接着是逗号或双引号(?=[,"])
紧接着是逗号或双引号第一组和第三组中的(?
确保这些组不包含在替换