提取特定字母后的文本和整数

时间:2019-05-29 07:17:53

标签: python regex list split integer

我有一个字符串:

s = 'travel to africa x 2\ asia x 2\ europe x 2\ Airport pick up included. Furnitures 3 seater couch x 1 4 seater+ couch x 1 < 60 inches TV x 1 60 inches+ TV x 1 Washer - front loader x 1 Box / bag / misc x 1 The maximum clearance is 1.5m.'

我想将其除以x,然后提取数字。

所以预期的输出是:

out = [('travel to africa', '2'),
       ('\ asia', '2'),
       ( '\ europe', '2'),
       ('\ Airport pick up included. Furnitures 3 seater couch', '1'),
       ('4 seater+ couch', '1'),
       ('< 60 inches TV', '1'),
       ('60 inches+ TV', '1'),
       ('Washer - front loader', '1'),
       ('Box / bag / misc', '1')]

我尝试使用此正则表达式,但失败了,因为省略了-+<之类的特殊字符(也应该有其他特殊字符):

r'([A-Za-z 0-9]+)\s+x\s+(\d+)'

提取此值的正确正则表达式是什么?还是没有正则表达式的可能解决方案?

3 个答案:

答案 0 :(得分:6)

您可以使用

re.findall(r'(.*?)\s+x\s*(\d+)', s)

请参见Python demoregex demo

(.*?)\s+x\s*(\d+)模式匹配

  • (.*?)-第1组:除换行符外的任何0+个字符
  • \s+-超过1个空格
  • x-x字符
  • \s*-超过0个空格
  • (\d+)-第2组:一个或多个数字。

如果您想在比赛开始时消除空格,请使用re.findall(r'(\S.*?)\s+x\s*(\d+)', s)(请参阅regex demo)或在获得所有比赛后使用理解力[x.strip() for x in re.findall(r'(.*?)\s+x\s*(\d+)', s)]

答案 1 :(得分:1)

这是一种解决方法。我通过尝试匹配每个组来简化问题,然后手动拆分。

s = 'travel to africa x 2\ asia x 2\ europe x 2\ Airport pick up included. Furnitures 3 seater couch x 1 4 seater+ couch x 1 < 60 inches TV x 1 60 inches+ TV x 1 Washer - front loader x 1 Box / bag / misc x 1 The maximum clearance is 1.5m.'
import re
res = []
for match in re.finditer(".*?x\s*\d+", s):
    l, _, r = map(str.strip, match.group().rpartition('x'))
    res.append((l, r))

输出:

[('travel to africa', '2'),
 ('\\ asia', '2'),
 ('\\ europe', '2'),
 ('\\ Airport pick up included. Furnitures 3 seater couch', '1'),
 ('4 seater+ couch', '1'),
 ('< 60 inches TV', '1'),
 ('60 inches+ TV', '1'),
 ('Washer - front loader', '1'),
 ('Box / bag / misc', '1')]

答案 2 :(得分:1)

我对问题的看法:

import re
import pprint

s = 'travel to africa x 2\ asia x 2\ europe x 2\ Airport pick up included. Furnitures 3 seater couch x 1 4 seater+ couch x 1 < 60 inches TV x 1 60 inches+ TV x 1 Washer - front loader x 1 Box / bag / misc x 1 The maximum clearance is 1.5m.'

out = []

for g in re.findall(r'(((^|\\?).*?)\s*x\s*(\d+)(.*?))', s):
    out += [[g[1], g[3]]]

pprint.pprint(out)

打印:

[['travel to africa', '2'],
 ['\\ asia', '2'],
 ['\\ europe', '2'],
 ['\\ Airport pick up included. Furnitures 3 seater couch', '1'],
 [' 4 seater+ couch', '1'],
 [' < 60 inches TV', '1'],
 [' 60 inches+ TV', '1'],
 [' Washer - front loader', '1'],
 [' Box / bag / misc', '1']]