Question

我正在尝试找到一个将下面的字符串拆分成列表的正则表达式。我还没有找到一种简单的分割字符串的方法，但问的主要原因是我无法理解为什么最后一个字符串被复制。当我在regex101.com在线测试时，它不会发生。根据我的理解，由于re.split功能，没有理由重复数据。

代码是：

import re
a = ['"This is a string", "and this is another with a , in it", Thisisalsovalid, "",,,"And a string"']
b = re.split(r',(?=(".*?"|[\w/-]*|,))', a[0])
for i in b:
    print(i)

和输出：

"This is a string"

 "and this is another with a

 in it"

 Thisisalsovalid

 ""




"And a string"
"And a string"

预期输出为：

"This is a string"
"and this is another with a , in it"
Thisisalsovalid
""


"And a string"

该列表将使用带有标题的列表进行压缩，而不会出现索引问题。

作为奖励，我很乐意得到一个正则表达式，除非它出现在一个字符串中，否则会分裂为'，'

Answer 1

,(?=(?:[^"]*""?[^"]*")*[^"]*$)

试试这个。看看演示。

https://regex101.com/r/nL5yL3/36

如果

，你可以工作

b = re.split(r',(?=(?:".*?"|[\w/-]*|,))', a[0])

                    ^^

使用此选项。由于您已经分组，因此出现了重复项。split也会返回分组元素。因此不会捕获。

Answer 2

为什么不使用现有的解决方案来读取csv格式的字符串？

import csv
import StringIO
s = ['"This is a string", "and this is another with a , in it", Thisisalsovalid, "",,,"And a string"']
reader = csv.reader(StringIO.StringIO(s[0]), skipinitialspace=True)
for row in reader:
    for value in row:
        print value

输出：

This is a string
and this is another with a , in it
Thisisalsovalid



And a string

python re模块中的奇怪行为

2 个答案: