Regex to get lbs and ozs i.e. 16lb 4ozs and it's variations

时间:2015-05-24 21:24:03

标签: regex regex-greedy

I am trying to get a regex pattern to match the following for lbs and ozs, I am 50% of the way there having used : https://regex101.com/ to get this far:

Regex Pattern at regex101

  • 6lb14ozs
  • 6 lb 14ozs
  • 6 lb 14ozs
  • 6 lb 14 ozs
  • 6 lbs 14ozs
  • 6 lb 14ozs
  • 69 lb 14ozs
  • 6lb 14 ozs
  • 6lb14 ozs
  • 6 lb14 ozs
  • 66lb14ozs
  • 66 lb14ozs
  • 65 lb 14ozs
  • 66lb1ozs
  • 66 lb1ozs
  • 65 lb 1ozs
  • 6lb14oz
  • 6 lb14oz
  • 6 lb 14oz
  • 6 lb 14 oz
  • 6lb 14 oz
  • 6lb14 oz
  • 6 lb14 oz
  • 100lb 13ozs

The pattern i am using is:

/(\d|\d\d|\d\d\d)\s*(?:lb|lbs)[^\s]?\s\d?\d\s*(oz|ozs)?[^\s]/g

i am sure it can be done better and more efficiently too.

4 个答案:

答案 0 :(得分:2)

\d+\s*lbs?\s*\d+\s*ozs?



\d+     One or more digits

\s*     Zero or more spaces

lbs?    lb or lbs

\s*     Zero or more spaces

\d+     One or more digits

\s*     Zero or more spaces

ozs?    oz or ozs

答案 1 :(得分:1)

How about [0-9]+[ ]*lbs?[ ]*[0-9]+[ ]*ozs?

In your attempt, you're making the units optional, so it'll probably match stuff you don't want it to match. Make the 's' optional instead.

Cheers, Paulo

To capture the numbers, you'd need ([0-9]+)[ ]*lbs?[ ]*([0-9]+)[ ]*ozs?.

To convert into kg, in Python you'd have (your data's in test_str):

import re
p = re.compile(ur'([0-9]+)[ ]*lbs?[ ]*(\d+)\s*ozs?')
test_str = "6lb14ozs\n6 lb 14ozs\n6 lb 14ozs\n6 lb 14 ozs\n6 lbs 14ozs\n6 lb 14ozs\n69 lb 14ozs\n6lb 14 ozs\n6lb14 ozs\n6 lb14 ozs\n66lb14ozs\n66 lb14ozs\n65 lb 14ozs\n66lb1ozs\n66 lb1ozs\n65 lb 1ozs\n6lb14oz\n6 lb14oz\n6 lb 14oz\n6 lb 14 oz\n6lb 14 oz\n6lb14 oz\n6 lb14 oz\n100lb 13ozs"

for i in re.findall(p, test_str):
    print float(i[0]) * 0.45 + float(i[1]) * 0.02

UPDATE

This version matches isolated lb(s) and oz(s), but the 's' isn't matched so the last 2 characters of each matched measurement can be used to determine the unit and make the conversion. The new regex is easier to read as well.

import re
p = re.compile(r"(\d+\s*lb)?s?\s*(\d+\s*oz)?s?\n")    

test_str = "6lb14ozs\n6 lb 14ozs\n6 lb 14ozs\n6 lb 14 ozs\n6 lbs 14ozs\n6 lb 14ozs\n69 lb 14ozs\n6lb 14 ozs\n6lb14 ozs\n6 lb14 ozs\n66lb14ozs\n66 lb14ozs\n65 lb 14ozs\n66lb1ozs\n66 lb1ozs\n65 lb 1ozs\n6lb14oz\n6 lb14oz\n6 lb 14oz\n6 lb 14 oz\n6lb 14 oz\n6lb14 oz\n6 lb14 oz\n100lb 13ozs"

for j in re.findall(p, test_str):
    print (sum ([int (i[:-2]) * {"lb":0.45, "oz":0.02}[i[-2:]] for i in j]))

答案 2 :(得分:1)

Something like

\s*(\d+)\s*(lbs|lb)\s*(\d+)\s*(ozs|oz)

should work.

Capture groups 0 and 2 will contains the amounts.

答案 3 :(得分:0)

Try this regex:

(\d+)\D+(\d+)\D+

It captures the two numeric groups, and bypasses anything non numeric.

Use the g flag.