我有很多字符串需要用逗号分隔。示例:
myString = r'test,Test,NEAR(this,that,DISTANCE=4),test again,"another test"'
myString = r'test,Test,FOLLOWEDBY(this,that,DISTANCE=4),test again,"another test"'
我想要的输出是:
["test", "Test", "NEAR(this,that,DISTANCE=4)", "test again", """another test"""] #list length = 5
我不知道如何在一个项目中保持“ this,that,DISTANCE”之间的逗号。我尝试过:
l = re.compile(r',').split(myString) # matches all commas
l = re.compile(r'(?<!\(),(?=\))').split(myString) # (negative lookback/lookforward) - no matches at all
有什么想法吗?假设允许的“功能”列表定义为:
f = ["NEAR","FOLLOWEDBY","AND","OR","MAX"]
答案 0 :(得分:2)
您可以使用
(?:\([^()]*\)|[^,])+
请参见the regex demo。
(?:\([^()]*\)|[^,])+
模式匹配括号中没有(
和)
或,
以外的其他任何字符的括号中一个或多个子串的出现。
请参见Python demo:
import re
rx = r"(?:\([^()]*\)|[^,])+"
s = 'test,Test,NEAR(this,that,DISTANCE=4),test again,"another test"'
print(re.findall(rx, s))
# => ['test', 'Test', 'NEAR(this,that,DISTANCE=4)', 'test again', '"another test"']
答案 1 :(得分:0)
如果显式地希望指定将哪些字符串计为函数,则需要动态构建正则表达式。否则,请使用Wiktor的解决方案。
>>> functions = ["NEAR","FOLLOWEDBY","AND","OR","MAX"]
>>> funcs = '|'.join('{}\([^\)]+\)'.format(f) for f in functions)
>>> regex = '({})|,'.format(funcs)
>>>
>>> myString1 = 'test,Test,NEAR(this,that,DISTANCE=4),test again,"another test"'
>>> list(filter(None, re.split(regex, myString1)))
['test', 'Test', 'NEAR(this,that,DISTANCE=4)', 'test again', '"another test"']
>>> myString2 = 'test,Test,FOLLOWEDBY(this,that,DISTANCE=4),test again,"another test"'
>>> list(filter(None, re.split(regex, myString2)))
['test',
'Test',
'FOLLOWEDBY(this,that,DISTANCE=4)',
'test again',
'"another test"']