正则表达式变体

时间:2018-03-12 06:54:21

标签: python regex python-3.x

我想从熊猫数据框中提取一件衣服的长度。该数据框的行如下所示:

A-line dress with darting at front and back | Surplice neckline | Long sleeves | About 23" from shoulder to hem | Triacetate/polyester | Dry clean | Imported | Model shown is 5'10" (177cm) wearing a size 4

正如你所看到的那样,大小包含在About和肩膀之间但是在某些情况下肩膀被腰部,下摆等取代。下面是我的python脚本找到长度但是当我说{{{{{{{{ 1}}因为我正在切割列表。

About

2 个答案:

答案 0 :(得分:1)

import re
s = """A-line dress with darting at front and back | Surplice neckline | Long sleeves | About 23" from shoulder to hem | Triacetate/polyester | Dry clean | Imported | Model shown is 5'10" (177cm) wearing a size 4"""
q = """'Velvet dress featuring mesh front, back and sleeves | Crewneck | Long bell sleeves | Self-tie closure at back cutout | About, 31" from shoulder to hem | Viscose/nylon | Hand wash | Imported | Model shown is 5\'10" (177cm) wearing a size Small.'1"""


def getSize(stringVal, strtoCheck): 
    for i in stringVal.split("|"):    #Split string by "|"
        if i.strip().startswith(strtoCheck):   #Check if string startswith "About"
            val =  i.strip()
            return re.findall("\d+", val)[0]    #Extract int

print getSize(s, "About")
print getSize(q, "About")

<强>输出

23
31

答案 1 :(得分:1)

你的正则表达式(?<=About).*?(?=[shoulder,waist,hem,bust,neck,bust,top,hips])使用character class来表示肩部,腰部,下摆,胸部,颈部,胸部,上部,臀部

我认为您希望使用或|将其置于非捕获组中。

使用可选的逗号,?

尝试这样做

(?<=About),? (\d+)(?=.*?(?:shoulder|waist|hem|bust|neck|bust|top|hips]))

大小在第一个捕获组中。