Question

我正在尝试使用RegEx通过电子邮件进行扫描，识别单词package，然后捕获下一个数字。

例如，在一封电子邮件的正文中有这一行：

NEFS 8 has a PACKAGE DEAL see below valued at $55,000.00 call if interested.

我正在尝试完成此任务的代码是：

word = ['package']
package_re = re.compile(r'({}).*?([\d,]+)'.format('|'.join(word)), re.IGNORECASE|re.MULTILINE|re.DOTALL)

with open(file_path) as f:
    for line in f:
        for match in package_re.finditer(f.read()):
            print("yessssssssssssss")
            price = match.group()
            print(price)

但它甚至无法打印＆＃34; yessssssssssssss＆＃34;这意味着RegEx本身就失败了......

我认为像这样的RegEx应该捕获列表word中的任何内容，然后.*?将捕获所有内容直到下一场比赛，这是一个给定的数字按[\d,]+。

任何帮助解决这个问题，我觉得这是一个非常简单的问题，我们对此表示赞赏。感谢。

使用Thunderbird打开电子邮件时的显示方式：

保存并作为txt.file打开时的显示方式（这是我的代码在btw上运行的版本）：

Answer 1

问题在于你的循环。您需要选择正确的组

word = ['package']
package_re = re.compile(r'({}).*?([\d,]+)'.format('|'.join(word)), re.I|re.M|re.S|re.U)

with open(file_path) as f:
    for match in package_re.finditer(f.read()):
        print("yessssssssssssss")
        price = match.group(2)
        print(price)

Python：识别单词，捕获以下数字

1 个答案: