我正在尝试用逗号分隔字符串“,”
例如:
"hi, welcome" I would like to produce ["hi","welcome"]
但是:
"'hi,hi',hi" I would like to produce ["'hi,hi'","hi"]
"'hi, hello,yes','hello, yes','eat,hello'" I would like to produce ["'hi, hello,yes'","'hello, yes'","'eat,hello'"]
"'hiello, 332',9" I would like to produce ["'hiello, 332'","9"]
我不认为可以使用.split()
函数,有没有人知道我可以这样做的方式,也许是正则表达式?
答案 0 :(得分:16)
您可以将csv模块与quotechar
参数一起使用,或者您可以转换输入以使用更标准的"
字符作为其引号字符。
>>> import csv
>>> from cStringIO import StringIO
>>> first=StringIO('hi, welcome')
>>> second=StringIO("'hi,hi',hi")
>>> third=StringIO("'hi, hello,yes','hello, yes','eat,hello'")
>>> fourth=StringIO("'hiello, 332',9")
>>> rfirst=csv.reader(first,quotechar="'")
>>> rfirst.next()
['hi', ' welcome']
>>> rsecond=csv.reader(second,quotechar="'")
>>> rsecond.next()
['hi,hi', 'hi']
>>> rthird=csv.reader(third,quotechar="'")
>>> rthird.next()
['hi, hello,yes', 'hello, yes', 'eat,hello']
>>> rfourth=csv.reader(fourth,quotechar="'")
>>> rfourth.next()
['hiello, 332', '9']
>>> second=StringIO('"hi,hi",hi') # This will be more straightforward to interpret.
>>> r=csv.reader(second)
>>> r.next()
['hi,hi', 'hi']
>>> third=StringIO('"hi, hello,yes","hello, yes","eat,hello"')
>>> r=csv.reader(third)
>>> r.next()
['hi, hello,yes', 'hello, yes', 'eat,hello']
答案 1 :(得分:2)
正如你要求的那样使用正则表达式:
import re
>>>pattern = re.compile(r"([^',]+,?|'[^']+,?')")
>>>re.findall(pattern, "hi, welcome")
['hi', 'welcome']
>>>re.findall(pattern, "'hi, hello,yes','hello, yes','eat,hello'")
["'hi, hello,yes'", "'hello, yes'", "'eat,hello'"]
>>>re.findall(pattern, "'hi,hi',hi")
["'hi,hi'", 'hi']
>>>re.findall(pattern, "'hiello, 332',9")
["'hiello, 332'", '9']
模式的第一部分[^',]+,?
捕获没有引号且没有逗号的段。它最后可能有一个逗号,也可能没有(如果它是最后一段)。
第二部分'[^']+,?'
捕获由引号括起的段。内部不应该有更多的引号,但它可能有逗号。
答案 2 :(得分:1)
您可以使用csv reader ,
作为分隔符,使用'
作为quotechar。这似乎与你的期望相符。
答案 3 :(得分:1)
直接在没有csv
或re
的情况下执行此操作不会有问题:
def splitstring(s):
result = []
for i, piece in enumerate(s.split("'")):
if piece:
if i % 2: # odd pieces are between quotes
result.append("'" + piece + "'")
else: # even pieces aren't
for subpiece in piece.split(","):
if subpiece:
result.append(subpiece)
return result