我正在尝试通过文字或数字来检测价格。有没有办法使用正则表达式来确定这一点,否则其他方法会更好?
对于数字,我想出的正则表达式是^ \ d {0,8}(。\ d {1,4})?$,我发现是here
可以使用正则表达式来检测文字价格,例如:515吗? 我正在查看杂货发票,下面是一个示例,我想提取每种产品的价格和总价。我还想知道是否可以使用正则表达式提取单词价格?
大型杂货
商品ID AMNIL PARA 101103
票据编号:100000000070
日期:2012年5月16日上午1:07
不。项数:4金额(卢比):415.65,
数量单位ItemName
2个没有的Amul冰淇淋-香草-1升装
2条)扑热息痛片500mg
1个没有封闭的牙膏-200克
1个没有吉列Mach3剃须刀片
总计
价格(卢比)
220.00
25.00
70.00
100.00
415.00
单词总价:四百一十五
答案 0 :(得分:1)
您可以将此正则表达式用作seen here(与PCRE和python兼容):
(?x) # free-spacing mode
(?(DEFINE)
# Within this DEFINE block, we'll define many subroutines
# They build on each other like lego until we can define
# a "big number"
(?<one_to_9>
# The basic regex:
# one|two|three|four|five|six|seven|eight|nine
# We'll use an optimized version:
# Option 1: four|eight|(?:fiv|(?:ni|o)n)e|t(?:wo|hree)|
# s(?:ix|even)
# Option 2:
(?:f(?:ive|our)|s(?:even|ix)|t(?:hree|wo)|(?:ni|o)ne|eight)
) # end one_to_9 definition
(?<ten_to_19>
# The basic regex:
# ten|eleven|twelve|thirteen|fourteen|fifteen|sixteen|seventeen|
# eighteen|nineteen
# We'll use an optimized version:
# Option 1: twelve|(?:(?:elev|t)e|(?:fif|eigh|nine|(?:thi|fou)r|
# s(?:ix|even))tee)n
# Option 2:
(?:(?:(?:s(?:even|ix)|f(?:our|if)|nine)te|e(?:ighte|lev))en|
t(?:(?:hirte)?en|welve))
) # end ten_to_19 definition
(?<two_digit_prefix>
# The basic regex:
# twenty|thirty|forty|fifty|sixty|seventy|eighty|ninety
# We'll use an optimized version:
# Option 1: (?:fif|six|eigh|nine|(?:tw|sev)en|(?:thi|fo)r)ty
# Option 2:
(?:s(?:even|ix)|t(?:hir|wen)|f(?:if|or)|eigh|nine)ty
) # end two_digit_prefix definition
(?<one_to_99>
(?&two_digit_prefix)(?:[- ](?&one_to_9))?|(?&ten_to_19)|
(?&one_to_9)
) # end one_to_99 definition
(?<one_to_999>
(?&one_to_9)[ ]hundred(?:[ ](?:and[ ])?(?&one_to_99))?|
(?&one_to_99)
) # end one_to_999 definition
(?<one_to_999_999>
(?&one_to_999)[ ]thousand(?:[ ](?&one_to_999))?|
(?&one_to_999)
) # end one_to_999_999 definition
(?<one_to_999_999_999>
(?&one_to_999)[ ]million(?:[ ](?&one_to_999_999))?|
(?&one_to_999_999)
) # end one_to_999_999_999 definition
(?<one_to_999_999_999_999>
(?&one_to_999)[ ]billion(?:[ ](?&one_to_999_999_999))?|
(?&one_to_999_999_999)
) # end one_to_999_999_999_999 definition
(?<one_to_999_999_999_999_999>
(?&one_to_999)[ ]trillion(?:[ ](?&one_to_999_999_999_999))?|
(?&one_to_999_999_999_999)
) # end one_to_999_999_999_999_999 definition
(?<bignumber>
zero|(?&one_to_999_999_999_999_999)
) # end bignumber definition
(?<zero_to_9>
(?&one_to_9)|zero
) # end zero to 9 definition
(?<decimals>
point(?:[ ](?&zero_to_9))+
) # end decimals definition
) # End DEFINE
####### The Regex Matching Starts Here ########
(?&bignumber)(?:[ ](?&decimals))?
### Other examples of groups we could match ###
#(?&bignumber)
# (?&one_to_99)
# (?&one_to_999)
# (?&one_to_999_999)
# (?&one_to_999_999_999)
# (?&one_to_999_999_999_999)
# (?&one_to_999_999_999_999_999)
但这可能太过激了:)
考虑数据的结构,也许您可以尝试找出Total Price in Words :
之后的内容
所以这样的事情可能对您有用:
^\h*Total Price in Words\s*:\s*(.*)
您将在第1组(通常是$1
或\1
)上找到数据
答案 1 :(得分:-1)
我建议使用https://regex101.com/codegen?language=python提供答案。
# coding=utf8
# the above tag defines encoding for this document and is for Python 2.x compatibility
import re
regex = r"[\d]*"
test_str = "My text with numbers : 324 and 2342 1 3. G00d Luck!"
matches = re.finditer(regex, test_str, re.MULTILINE)
for matchNum, match in enumerate(matches):
matchNum = matchNum + 1
print ("Match {matchNum} was found at {start}-{end}: {match}".format(matchNum = matchNum, start = match.start(), end = match.end(), match = match.group()))
for groupNum in range(0, len(match.groups())):
groupNum = groupNum + 1
print ("Group {groupNum} found at {start}-{end}: {group}".format(groupNum = groupNum, start = match.start(groupNum), end = match.end(groupNum), group = match.group(groupNum)))
# Note: for Python 2.7 compatibility, use ur"" to prefix the regex and u"" to prefix the test string and substitution.