正则表达式捕获货币表达式

时间:2016-05-13 01:43:22

标签: python regex

我在简单的python程序中使用Regex时遇到了麻烦。我试图捕捉所写的美元金额的所有货币表达式(例如:"五百美元","三十万美元和四十美分")但是我&#39我遇到了麻烦。

我的程序只返回空字符串,而我收到的一些初步反馈是我的正则表达式是#34;太贪心"和覆盖,我不确定"如何"以及为什么它最终导致空字符串以及如何修复它。

这是我的python代码:

import re; 
import sys;
file2 = open("test2.txt", "r")
input_txt2 = file2.read() 
distjunct3 = r"(?:(?:(?:a|one|two|three|four|five|six|seven|eight|nine|ten|eleven|twelve)?(?:(thir|four|fif|six|seven|eight|nine)teen)?)(?:(?:twen|thir|four|fif|six|seven|eight|nine)ty)?(?:(?:one|two|three|four|five|six|seven|eight|nine|ten) (?:(?:hundred|thousand|)|(?:\w.llion)))?(?: \w+)? dollar(?:s)?(?: and [0-9]{1,2} cents)?)"
def repl(matchobj):
return "[" + matchobj.group() + "]";
print re.findall(distjunct3, input_txt2)
file2.close()

这是我的正则表达式:

(?:(?:(?:a|one|two|three|four|five|six|seven|eight|nine|ten|eleven|twelve)?(?:(thir|four|fif|six|seven|eight|nine)teen)?)(?:(?:twen|thir|four|fif|six|seven|eight|nine)ty)?(?:(?:one|two|three|four|five|six|seven|eight|nine|ten) (?:(?:hundred|thousand|)|(?:\w.llion)))?(?: \w+)? dollar(?:s)?(?: and [0-9]{1,2} cents)?)

"我已经在http://regexr.com/上测试了我的代码,它似乎与这个示例文本一起使用:超过16美元,一个头四美元,但现在减少到一个,这个向他们收费  有价值和三千美元:一个洛杉矶 十亿美元十二英镑一美元。值得一美元和n美元 - 十二个皮肤,一个黄金,黑暗 并且只需要两美元,就可以筹集8到10美元。 "正八美元;想到这一点! 一个,价值二十美元 - 这是你的价值 死了,二十美元。 "因子在交易中支付7美元,其中 八美元大衣。"

我很困惑,肯定会感谢任何指针,谢谢!!

2 个答案:

答案 0 :(得分:1)

这实际上是一种更简单的模式。在伪正则表达式中,它的格式为:" (number words)+ dollars (and (number words)+ cents)?" :(适用于您的输入等)

((?:(?:a|one|two|twen|thir|three|four|five|fif|six|seven|eight|nine|ten|eleven|twelve|hundred|thousand|million|billion)(?:y|ty|teen)?[\s-]?)+(?:[\s-]?dollars?(?: (?:and|&) (?:[0-9]{1,2}|no|(?:a|one|two|twen|thir|three|four|five|fif|six|seven|eight|nine|ten|eleven|twelve|hundred|thousand|million|billion)(?:y|ty|teen)?)+ cents)?))

regex demo输出:

enter image description here

答案 1 :(得分:0)

numwords = ["and", "a" ,"one", "two", "three", "four", "five", "six", "seven", "eight",\
"nine", "ten", "eleven" "twelve", "thirteen", "fourteen", "fifteen", "sixteen",\
"seventeen", "eighteen", "nineteen", "twenty", "thirty", "fourty", "fifty", "sixty",\
"seventy", "eighty", "ninety", "hundred", "thousand", "million", "billion", "trillion"]
teststr = "exceed sixteen dollars y four dollars a head, but it is now reduced to one, and this charge they valuable andto three thousand dollars: a los hundred thousand dollars for twelve pounds for a dollar. Ths worth a dollar and n'tSix dollars--twelve skins, for a prime, dark and tuck--eight or ten dollars, according to only two dollars. \"orth eight dollars; think of that! one, worth twenty dollars--that's your value dead, twenty dollars"
splitstr = teststr.split()
dollarfound = []
for index, s in enumerate(splitstr):
    templist = []
    if s == "dollar" or s == "dollars":
        templist.append(splitstr[index])
        while (index-1 >= 0) and (splitstr[index-1] in numwords):
            templist.append(splitstr[index-1])
            index -=1
        dollarfound.append(" ".join(reversed(templist)))
print(dollarfound)

此代码查找单词dollar(s)和backtracks的实例,以获取它之前的所有数字。你的用例并不真正需要正则表达式。