Question

我想分割字符串

Expression = "((((324+17)*3)/((936-51)+124))-((13*(72-41))+6))"

我使用str.split（）但它将数字拆分为：“3 2 4 + 1 7”

输出：

"( ( ( ( 324 + 17 ) * 3 ) / ( ( 936 - 51 ) + 124 ) ) - ( ( 13 * ( 72 - 41 ) ) + 6 ) )"

Answer 1

我认为您需要在每个非数字字符之间插入空格。斯普利特将无法为你完成这项工作，你可以使用re.sub。

这是我能够快速提出的，并且可以更好地表达在单次迭代中执行此操作，但它会给你一个想法

import re
Expression = "((((324+17)*3)/((936-51)+124))-((13*(72-41))+6))"
# Insert space after every non numeric characters
str = re.sub("([^0-9])", r'\1 ', Expression).strip()
#Insert space after numeric characters which are followed by non numeric characters
str = re.sub("([0-9])([^0-9])", r'\1 \2', str).strip()
print(str)

<强>输出
( ( ( ( 324 + 17 ) * 3 ) / ( ( 936 - 51 ) + 124 ) ) - ( ( 13 * ( 72 - 41 ) ) + 6 ) )

Answer 2

你可以使用字典用填充的等价物替换某些字符：

>>> Expression = "((((324+17)*3)/((936-51)+124))-((13*(72-41))+6))"
>>> d = {'(':'( ', ')':' )', '+': ' + ', '-': ' - ', '*': ' * ', '/': ' / '}
>>> ''.join(d[c] if c in d else c for c in Expression)
'( ( ( ( 324 + 17 ) * 3 ) / ( ( 936 - 51 ) + 124 ) ) - ( ( 13 * ( 72 - 41 ) ) + 6 ) )'

请注意，填充字典将(填充到右侧一个空格，)左侧有一个空格，操作符按两侧间距填充。这可以防止使用嵌套括号进行过度填充。

Answer 3

您正在寻找tokenize字符串。对于Python表达式，您可以使用tokenize模块执行此操作，或者对于可以使用各种搜索功能的更简洁的表单。以下是两个例子：

A = np.unique(np.random.randint(0, 1000, 1000))
B = [[list(string.ascii_letters[:random.randint(3, 10)])] for _ in range(len(A))]
df = pd.DataFrame({"A":A, "B":B})
print (df)
       A                                 B
0      0        [[a, b, c, d, e, f, g, h]]
1      1                       [[a, b, c]]
2      3     [[a, b, c, d, e, f, g, h, i]]
3      5                 [[a, b, c, d, e]]
4      6     [[a, b, c, d, e, f, g, h, i]]
5      7           [[a, b, c, d, e, f, g]]
6      8              [[a, b, c, d, e, f]]
7     10              [[a, b, c, d, e, f]]
8     11           [[a, b, c, d, e, f, g]]
9     12     [[a, b, c, d, e, f, g, h, i]]
10    13        [[a, b, c, d, e, f, g, h]]
...
...

In [67]: %timeit pd.DataFrame({ "A": np.repeat(df.A.values, [len(x) for x in (chain.from_iterable(df.B))]),"B": list(chain.from_iterable(chain.from_iterable(df.B)))})
1000 loops, best of 3: 818 µs per loop

In [68]: %timeit ((df['B'].apply(lambda x: pd.Series(x[0])).stack().reset_index(level=1, drop=True).to_frame('B').join(df[['A']], how='left')))
10 loops, best of 3: 103 ms per loop

>>> expression = "((((324+17)*3)/((936-51)+124))-((13*(72-41))+6))" >>> import re >>> re.findall('[0-9]+|.', expression) ['(', '(', '(', '(', '324', '+', '17', ')', '*', '3', ')', '/', '(', '(', '936', '-', '51', ')', '+', '124', ')', ')', '-', '(', '(', '13', '*', '(', '72', '-', '41', ')', ')', '+', '6', ')', ')'] >>> import tokenize >>> [t.string for t in tokenize.tokenize(iter([expression.encode('utf-8')]).__next__) ... if t.type not in (tokenize.ENCODING, tokenize.ENDMARKER)] ['(', '(', '(', '(', '324', '+', '17', ')', '*', '3', ')', '/', '(', '(', '936', '-', '51', ')', '+', '124', ')', ')', '-', '(', '(', '13', '*', '(', '72', '-', '41', ')', ')', '+', '6', ')', ')']部分是必需的，因为tokenize通常会读出文件，并且没有单个字符串的快捷方式。

拆分表达式字符串

3 个答案: