给出示例字符串s = 'Hi, my name is Humpty-Dumpty, from "Alice, Through the Looking Glass"'
,我想将其分派为以下块:
# To Do: something like {l = s.split(',')}
l = ['Hi', 'my name is Humpty-Dumpty', '"Alice, Through the Looking Glass"']
我不知道在哪里找到多少个分隔符。
这是我最初的想法,它很长,而且不准确,因为它删除了所有定界符,而我希望引号内的定界符能够继续存在:
s = 'Hi, my name is Humpty-Dumpty, from "Alice, Through the Looking Glass"'
ss = []
inner_string = ""
delimiter = ','
for item in s.split(delimiter):
if not inner_string:
if '\"' not in item: # regullar string. not intersting
ss.append(item)
else:
inner_string += item # start inner string
elif inner_string:
inner_string += item
if '\"' in item: # end inner string
ss.append(inner_string)
inner_string = ""
else: # middle of inner string
pass
print(ss)
# prints ['Hi', ' my name is Humpty-Dumpty', ' from "Alice Through the Looking Glass"'] which is OK-ish
答案 0 :(得分:2)
您可以使用re.split
按正则表达式进行拆分:
>>> import re
>>> [x for x in re.split(r'([^",]*(?:"[^"]*"[^",]*)*)', s) if x not in (',','')]
s
等于:
'Hi, my name is Humpty-Dumpty, from "Alice, Through the Looking Glass"'
它输出:
['Hi', ' my name is Humpty-Dumpty', ' from "Alice, Through the Looking Glass"']
正则表达式说明:
(
[^",]* zero or more chars other than " or ,
(?: non-capturing group
"[^"]*" quoted block
[^",]* followed by zero or more chars other than " or ,
)* zero or more times
)
答案 1 :(得分:1)
我通过完全避免使用split
来解决了这个问题:
s = 'Hi, my name is Humpty-Dumpty, from "Alice, Through the Looking Glass"'
l = []
substr = ""
quotes_open = False
for c in s:
if c == ',' and not quotes_open: # check for comma only if no quotes open
l.append(substr)
substr = ""
elif c == '\"':
quotes_open = not quotes_open
else:
substr += c
l.append(substr)
print(l)
输出:
['Hi', ' my name is Humpty-Dumpty', ' from Alice, Through the Looking Glass']
更通用的功能可能类似于:
def custom_split(input_str, delimiter=' ', avoid_between_char='\"'):
l = []
substr = ""
between_avoid_chars = False
for c in s:
if c == delimiter and not between_avoid_chars:
l.append(substr)
substr = ""
elif c == avoid_between_char:
between_avoid_chars = not between_avoid_chars
else:
substr += c
l.append(substr)
return l
答案 2 :(得分:0)
这将适用于这种特定情况,并且可以提供一个起点。
import re
s = 'Hi, my name is Humpty-Dumpty, from "Alice, Through the Looking Glass"'
cut = re.search('(".*")', s)
r = re.sub('(".*")', '$VAR$', s).split(',')
res = []
for i in r:
res.append(re.sub('\$VAR\$', cut.group(1), i))
输出
print(res)
['Hi', ' my name is Humpty-Dumpty', ' from "Alice, Through the Looking Glass"']