Jamies_string = "Hello there {my name is jamie}".split()
print(Jamies_string)
此处输出:
['Hello', 'there', '{my', 'name', 'is', 'jamie}']
此处的所需输出:
['Hello', 'there', '{', 'my', 'name', 'is', 'jamie', '}']
我真的想远离任何涉及使用re库的解决方案,谢谢。
答案 0 :(得分:4)
您可以先在这些符号周围添加空格,然后使用split()
,例如
>>> s = "Hello there {my name is jamie}"
>>> s.replace("{", " { ").replace("}", " } ").split()
['Hello', 'there', '{', 'my', 'name', 'is', 'jamie', '}']
答案 1 :(得分:3)
一种解决方案是创建一个对字符进行分类并将其用作itertools.groupby()
的关键函数的函数:
WHITESPACE = 0
LETTERS = 1
DIGITS = 2
SYMBOLS = 3
def character_class(c):
if c.isspace():
return WHITESPACE
if c.isalpha():
return LETTERS
if c.isdigit():
return DIGITS
return SYMBOLS
s = "Hello there {my name is jamie}"
tokens = [
"".join(chars)
for cls, chars in itertools.groupby(s, character_class)
if cls != WHITESPACE
]
print(tokens)
打印
['Hello', 'there', '{', 'my', 'name', 'is', 'jamie', '}']
您澄清了出于性能原因而希望避免使用正则表达式。这个答案中的方法肯定比使用正则表达式正确慢。但是,我不认为您的项目处于需要担心性能的阶段。
答案 2 :(得分:1)
您使用的字符串类似于Python中的format string。如果是这样,您可以使用Formatter
类来解析它:
from string import Formatter
def solve(s):
for f in Formatter().parse(s):
yield from f[0].split()
if f[1]:
yield from ['{'] + f[1].split() + ['}']
<强>演示:强>
>>> list(solve("Hello there {my name is jamie}"))
['Hello', 'there', '{', 'my', 'name', 'is', 'jamie', '}']
>>> list(solve("Hello there {my name is jamie} {hello world} end."))
['Hello', 'there', '{', 'my', 'name', 'is', 'jamie', '}', '{', 'hello', 'world', '}', 'end.']
答案 3 :(得分:1)
一种方式(不像其他答案一样干净,但它有效):
def tokenize(string):
WHITESPACE = 0 #Borrowed from Sven's answer
LETTERS = 1
DIGITS = 2
SYMBOLS = 3
def character_class(c):
if c.isspace():
return WHITESPACE
elif c.isalpha():
return LETTERS
elif c.isdigit():
return DIGITS
return SYMBOLS
lastType = character_class(string[0])
chunk = ""
for i, char in enumerate(string):
charType = character_class(char)
if charType == WHITESPACE:
if chunk: #Only yield if non-empty
yield chunk
chunk = ""
lastType = character_class(string[i + 1]) #Type of next character because we want the next part to not have leading whitespace
continue #Don't add to chunk
elif charType != lastType: #Different type
if chunk: #Only yield if non-empty
yield chunk
chunk = ""
lastType = charType
chunk += char
if chunk:
yield chunk
print(list(tokenize("Hello there {my name is jamie}")))
示例输出:
['Hello', 'there', '{', 'my', 'name', 'is', 'jamie', '}']
这或多或少是手动执行itertools.groupby
所做的事情。
答案 4 :(得分:1)
在字符串中进行传递并在所有标点字符周围放置空格,然后在空白处分割。
>>>> import string
>>> s = "Hello there {my name is jamie}"
>>> s = ''.join(c if c.isalnum() or c.isspace() else ' {} '.format(c) for c in s)
>>> s.split()
['Hello', 'there', '{', 'my', 'name', 'is', 'jamie', '}']
>>>
稍微扩展第三行 -
a = []
for c in s:
if not c.isalnum() and not c.isspace():
c = ' ' + c + ' '
a.append(c)
s = ''.join(a)
s.split()