Question

我在简单的python程序中使用Regex时遇到了麻烦。我试图捕捉所写的美元金额的所有货币表达式（例如：＆＃34;五百美元＆＃34;，＆＃34;三十万美元和四十美分＆＃34;）但是我＆＃39我遇到了麻烦。

我的程序只返回空字符串，而我收到的一些初步反馈是我的正则表达式是＃34;太贪心＆＃34;和覆盖，我不确定＆＃34;如何＆＃34;以及为什么它最终导致空字符串以及如何修复它。

这是我的python代码：

import re; 
import sys;
file2 = open("test2.txt", "r")
input_txt2 = file2.read() 
distjunct3 = r"(?:(?:(?:a|one|two|three|four|five|six|seven|eight|nine|ten|eleven|twelve)?(?:(thir|four|fif|six|seven|eight|nine)teen)?)(?:(?:twen|thir|four|fif|six|seven|eight|nine)ty)?(?:(?:one|two|three|four|five|six|seven|eight|nine|ten) (?:(?:hundred|thousand|)|(?:\w.llion)))?(?: \w+)? dollar(?:s)?(?: and [0-9]{1,2} cents)?)"
def repl(matchobj):
return "[" + matchobj.group() + "]";
print re.findall(distjunct3, input_txt2)
file2.close()

这是我的正则表达式：

(?:(?:(?:a|one|two|three|four|five|six|seven|eight|nine|ten|eleven|twelve)?(?:(thir|four|fif|six|seven|eight|nine)teen)?)(?:(?:twen|thir|four|fif|six|seven|eight|nine)ty)?(?:(?:one|two|three|four|five|six|seven|eight|nine|ten) (?:(?:hundred|thousand|)|(?:\w.llion)))?(?: \w+)? dollar(?:s)?(?: and [0-9]{1,2} cents)?)

＆＃34;我已经在http://regexr.com/上测试了我的代码，它似乎与这个示例文本一起使用：超过16美元，一个头四美元，但现在减少到一个，这个向他们收费有价值和三千美元：一个洛杉矶十亿美元十二英镑一美元。值得一美元和n美元 - 十二个皮肤，一个黄金，黑暗并且只需要两美元，就可以筹集8到10美元。＆＃34;正八美元;想到这一点！一个，价值二十美元 - 这是你的价值死了，二十美元。＆＃34;因子在交易中支付7美元，其中八美元大衣。＆＃34;

我很困惑，肯定会感谢任何指针，谢谢!!

Answer 1

这实际上是一种更简单的模式。在伪正则表达式中，它的格式为：＆＃34; (number words)+ dollars (and (number words)+ cents)?＆＃34; :(适用于您的输入等）

((?:(?:a|one|two|twen|thir|three|four|five|fif|six|seven|eight|nine|ten|eleven|twelve|hundred|thousand|million|billion)(?:y|ty|teen)?[\s-]?)+(?:[\s-]?dollars?(?: (?:and|&) (?:[0-9]{1,2}|no|(?:a|one|two|twen|thir|three|four|five|fif|six|seven|eight|nine|ten|eleven|twelve|hundred|thousand|million|billion)(?:y|ty|teen)?)+ cents)?))

regex demo输出：

Answer 2

numwords = ["and", "a" ,"one", "two", "three", "four", "five", "six", "seven", "eight",\
"nine", "ten", "eleven" "twelve", "thirteen", "fourteen", "fifteen", "sixteen",\
"seventeen", "eighteen", "nineteen", "twenty", "thirty", "fourty", "fifty", "sixty",\
"seventy", "eighty", "ninety", "hundred", "thousand", "million", "billion", "trillion"]
teststr = "exceed sixteen dollars y four dollars a head, but it is now reduced to one, and this charge they valuable andto three thousand dollars: a los hundred thousand dollars for twelve pounds for a dollar. Ths worth a dollar and n'tSix dollars--twelve skins, for a prime, dark and tuck--eight or ten dollars, according to only two dollars. \"orth eight dollars; think of that! one, worth twenty dollars--that's your value dead, twenty dollars"
splitstr = teststr.split()
dollarfound = []
for index, s in enumerate(splitstr):
    templist = []
    if s == "dollar" or s == "dollars":
        templist.append(splitstr[index])
        while (index-1 >= 0) and (splitstr[index-1] in numwords):
            templist.append(splitstr[index-1])
            index -=1
        dollarfound.append(" ".join(reversed(templist)))
print(dollarfound)

此代码查找单词dollar（s）和backtracks的实例，以获取它之前的所有数字。你的用例并不真正需要正则表达式。

正则表达式捕获货币表达式

2 个答案: