I am trying to get a regex pattern to match the following for lbs and ozs, I am 50% of the way there having used : https://regex101.com/ to get this far:
The pattern i am using is:
/(\d|\d\d|\d\d\d)\s*(?:lb|lbs)[^\s]?\s\d?\d\s*(oz|ozs)?[^\s]/g
i am sure it can be done better and more efficiently too.
答案 0 :(得分:2)
\d+\s*lbs?\s*\d+\s*ozs?
\d+ One or more digits
\s* Zero or more spaces
lbs? lb or lbs
\s* Zero or more spaces
\d+ One or more digits
\s* Zero or more spaces
ozs? oz or ozs
答案 1 :(得分:1)
How about [0-9]+[ ]*lbs?[ ]*[0-9]+[ ]*ozs?
In your attempt, you're making the units optional, so it'll probably match stuff you don't want it to match. Make the 's' optional instead.
Cheers, Paulo
To capture the numbers, you'd need ([0-9]+)[ ]*lbs?[ ]*([0-9]+)[ ]*ozs?
.
To convert into kg, in Python you'd have (your data's in test_str
):
import re
p = re.compile(ur'([0-9]+)[ ]*lbs?[ ]*(\d+)\s*ozs?')
test_str = "6lb14ozs\n6 lb 14ozs\n6 lb 14ozs\n6 lb 14 ozs\n6 lbs 14ozs\n6 lb 14ozs\n69 lb 14ozs\n6lb 14 ozs\n6lb14 ozs\n6 lb14 ozs\n66lb14ozs\n66 lb14ozs\n65 lb 14ozs\n66lb1ozs\n66 lb1ozs\n65 lb 1ozs\n6lb14oz\n6 lb14oz\n6 lb 14oz\n6 lb 14 oz\n6lb 14 oz\n6lb14 oz\n6 lb14 oz\n100lb 13ozs"
for i in re.findall(p, test_str):
print float(i[0]) * 0.45 + float(i[1]) * 0.02
UPDATE
This version matches isolated lb(s) and oz(s), but the 's' isn't matched so the last 2 characters of each matched measurement can be used to determine the unit and make the conversion. The new regex is easier to read as well.
import re
p = re.compile(r"(\d+\s*lb)?s?\s*(\d+\s*oz)?s?\n")
test_str = "6lb14ozs\n6 lb 14ozs\n6 lb 14ozs\n6 lb 14 ozs\n6 lbs 14ozs\n6 lb 14ozs\n69 lb 14ozs\n6lb 14 ozs\n6lb14 ozs\n6 lb14 ozs\n66lb14ozs\n66 lb14ozs\n65 lb 14ozs\n66lb1ozs\n66 lb1ozs\n65 lb 1ozs\n6lb14oz\n6 lb14oz\n6 lb 14oz\n6 lb 14 oz\n6lb 14 oz\n6lb14 oz\n6 lb14 oz\n100lb 13ozs"
for j in re.findall(p, test_str):
print (sum ([int (i[:-2]) * {"lb":0.45, "oz":0.02}[i[-2:]] for i in j]))
答案 2 :(得分:1)
Something like
\s*(\d+)\s*(lbs|lb)\s*(\d+)\s*(ozs|oz)
should work.
Capture groups 0 and 2 will contains the amounts.
答案 3 :(得分:0)
Try this regex:
(\d+)\D+(\d+)\D+
It captures the two numeric groups, and bypasses anything non numeric.
Use the g flag.