我想从熊猫数据框中提取一件衣服的长度。该数据框的行如下所示:
A-line dress with darting at front and back | Surplice neckline | Long sleeves | About 23" from shoulder to hem | Triacetate/polyester | Dry clean | Imported | Model shown is 5'10" (177cm) wearing a size 4
正如你所看到的那样,大小包含在About和肩膀之间但是在某些情况下肩膀被腰部,下摆等取代。下面是我的python脚本找到长度但是当我说{{{{{{{{ 1}}因为我正在切割列表。
About
答案 0 :(得分:1)
import re
s = """A-line dress with darting at front and back | Surplice neckline | Long sleeves | About 23" from shoulder to hem | Triacetate/polyester | Dry clean | Imported | Model shown is 5'10" (177cm) wearing a size 4"""
q = """'Velvet dress featuring mesh front, back and sleeves | Crewneck | Long bell sleeves | Self-tie closure at back cutout | About, 31" from shoulder to hem | Viscose/nylon | Hand wash | Imported | Model shown is 5\'10" (177cm) wearing a size Small.'1"""
def getSize(stringVal, strtoCheck):
for i in stringVal.split("|"): #Split string by "|"
if i.strip().startswith(strtoCheck): #Check if string startswith "About"
val = i.strip()
return re.findall("\d+", val)[0] #Extract int
print getSize(s, "About")
print getSize(q, "About")
<强>输出强>:
23
31
答案 1 :(得分:1)
你的正则表达式(?<=About).*?(?=[shoulder,waist,hem,bust,neck,bust,top,hips])
使用character class来表示肩部,腰部,下摆,胸部,颈部,胸部,上部,臀部。
我认为您希望使用或|
将其置于非捕获组中。
使用可选的逗号,?
:
(?<=About),? (\d+)(?=.*?(?:shoulder|waist|hem|bust|neck|bust|top|hips]))
大小在第一个捕获组中。