Question

我有一个字符串：

s = 'travel to africa x 2\ asia x 2\ europe x 2\ Airport pick up included. Furnitures 3 seater couch x 1 4 seater+ couch x 1 < 60 inches TV x 1 60 inches+ TV x 1 Washer - front loader x 1 Box / bag / misc x 1 The maximum clearance is 1.5m.'

我想将其除以x，然后提取数字。

所以预期的输出是：

out = [('travel to africa', '2'),
       ('\ asia', '2'),
       ( '\ europe', '2'),
       ('\ Airport pick up included. Furnitures 3 seater couch', '1'),
       ('4 seater+ couch', '1'),
       ('< 60 inches TV', '1'),
       ('60 inches+ TV', '1'),
       ('Washer - front loader', '1'),
       ('Box / bag / misc', '1')]

我尝试使用此正则表达式，但失败了，因为省略了-+<之类的特殊字符（也应该有其他特殊字符）：

r'([A-Za-z 0-9]+)\s+x\s+(\d+)'

提取此值的正确正则表达式是什么？还是没有正则表达式的可能解决方案？

Answer 1

您可以使用

re.findall(r'(.*?)\s+x\s*(\d+)', s)

请参见Python demo和regex demo。

(.*?)\s+x\s*(\d+)模式匹配

(.*?)-第1组：除换行符外的任何0+个字符
\s+-超过1个空格
x-x字符
\s*-超过0个空格
(\d+)-第2组：一个或多个数字。

如果您想在比赛开始时消除空格，请使用re.findall(r'(\S.*?)\s+x\s*(\d+)', s)（请参阅regex demo）或在获得所有比赛后使用理解力[x.strip() for x in re.findall(r'(.*?)\s+x\s*(\d+)', s)]。

Answer 2

这是一种解决方法。我通过尝试匹配每个组来简化问题，然后手动拆分。

s = 'travel to africa x 2\ asia x 2\ europe x 2\ Airport pick up included. Furnitures 3 seater couch x 1 4 seater+ couch x 1 < 60 inches TV x 1 60 inches+ TV x 1 Washer - front loader x 1 Box / bag / misc x 1 The maximum clearance is 1.5m.'
import re
res = []
for match in re.finditer(".*?x\s*\d+", s):
    l, _, r = map(str.strip, match.group().rpartition('x'))
    res.append((l, r))

输出：

[('travel to africa', '2'),
 ('\\ asia', '2'),
 ('\\ europe', '2'),
 ('\\ Airport pick up included. Furnitures 3 seater couch', '1'),
 ('4 seater+ couch', '1'),
 ('< 60 inches TV', '1'),
 ('60 inches+ TV', '1'),
 ('Washer - front loader', '1'),
 ('Box / bag / misc', '1')]

Answer 3

我对问题的看法：

import re
import pprint

s = 'travel to africa x 2\ asia x 2\ europe x 2\ Airport pick up included. Furnitures 3 seater couch x 1 4 seater+ couch x 1 < 60 inches TV x 1 60 inches+ TV x 1 Washer - front loader x 1 Box / bag / misc x 1 The maximum clearance is 1.5m.'

out = []

for g in re.findall(r'(((^|\\?).*?)\s*x\s*(\d+)(.*?))', s):
    out += [[g[1], g[3]]]

pprint.pprint(out)

打印：

[['travel to africa', '2'],
 ['\\ asia', '2'],
 ['\\ europe', '2'],
 ['\\ Airport pick up included. Furnitures 3 seater couch', '1'],
 [' 4 seater+ couch', '1'],
 [' < 60 inches TV', '1'],
 [' 60 inches+ TV', '1'],
 [' Washer - front loader', '1'],
 [' Box / bag / misc', '1']]

提取特定字母后的文本和整数

3 个答案: