Question

我有

形式的字符串序列

s1 = "Schblaum 12324 tunguska 24 234n"
s2 = "jacarta 331 matchika 22 234k"
s3 = "3239 thingolee 80394 234k"

我需要将两个字符串中的字符串分开，就在字符串中间的数字之后，忽略字符串的第一部分是否有数字。像

这样的东西

["Schblaum 12324", "tunguska 24 234n"]
["jacarta 331", "matchika 22 234k"]
["3239 thingolee 80394", "bb 6238"]

我尝试以

形式使用正则表达式

finder = re.compile(""\D(\d+)\D"")
finder.search(s1)

无济于事。有没有办法做到这一点，也许没有使用正则表达式？干杯！

编辑：刚刚找到一个初始字符串只是

的情况

"jacarta 43453"

没有其他部分。这应该返回

["jarcata 43453"]

Answer 1

使用re.findall

>>> import re
>>> s1 = "Schblaum 12324 tunguska 24 234n"
>>> re.findall(r'^\S+\D*\d+|\S.*', s1)
['Schblaum 12324', 'tunguska 24 234n']
>>> s2 = "jacarta 331 matchika 22 234k"
>>> s3 = "3239 thingolee 80394 234k"
>>> re.findall(r'^\S+\D*\d+|\S.*', s2)
['jacarta 331', 'matchika 22 234k']
>>> re.findall(r'^\S+\D*\d+|\S.*', s3)
['3239 thingolee 80394', '234k']

Answer 2

即使没有正则表达式，你所做的只是寻找数字并在它之后分裂。尝试：

s = "Schblaum 12324 tunguska 24 234n"
words = s.split()
for idx, word in enumerate(words[1:], start=1):  # skip the first element
    if word.isdigit():
        break
before, after = ' '.join(words[:idx+1]), \
                ' '.join(words[idx+1:])

您还可以使用re.split查找后方空格并查看数字，但之后您必须处理，因为它也会在第一个之后拆分。

import re

s3 = "3239 thingolee 80394 234k"
result = re.split(r"(?<=\d)\s", s3, 2)  # split at most twice
if len(result) > 2:
    before = ' '.join(result[:2])
else:
    before = result[0]
after = result[-1]

匹配字符串中的数字与python中的某些条件

2 个答案: