如何在Python中重复正则表达式中提取两个组

时间:2016-04-06 15:54:47

标签: python regex

让我们说我有一个看起来像这样的字符串:

my_date = February 4 - March 23, 2015

我想创建一个将提取月份名称和年份的正则表达式,所以我将其设置为:

date_regex = r"^(?:(Jan(?:uary)?|Feb(?:ruary)|Marc?h?|Apr[il1]?[I1l]?|May|June?|July?|Aug(?:ust)?|Sep(?:tember)?|Oct(?:tober)?|Nov(?:ember)?|Dec(?:ember)?)\s+\d?\d(?:\s+-\s+)?){2},\s+(20[01]\d)"

我认为通过将整个正则表达式与一个不匹配的组中的月份和日期相匹配并使用{2}来说它应该有两个,我很聪明,但不幸的是我得到的组这是("March", "2015")。似乎没有捕获"二月和#34;的第一场比赛。

我哪里错了?这是我的正则表达式,还是这不可能?

This question似乎相关,似乎暗示如果没有regex模块,我尝试做的事情是不可能的。

由于

2 个答案:

答案 0 :(得分:1)

试试这个RegEx:

(Jan(?:uary)?|Feb(?:ruary)|Marc?h?|Apr[il1]?[I1l]?|May|June?|July?|Aug(?:ust)?|Sep(?:tember)?|Oct(?:tober)?|Nov(?:ember)?|Dec(?:ember)?|20[01]\d)

你过度复杂了。只需选择一个月或一年(20[01]\d

即可

Live Demo on Regex101

工作原理:

(
    Jan(?:uary)?|          # January
    Feb(?:ruary)|          # February
    Marc?h?|               # March
    Apr[il1]?[I1l]?|       # April
    May|                   # May
    June?|                 # June
    July?|                 # July
    Aug(?:ust)?|           # August
    Sep(?:tember)?|        # September
    Oct(?:tober)?|         # October
    Nov(?:ember)?|         # November
    Dec(?:ember)?|         # December
    20[01]\d               # Year
)

它将选择月份名称或年份。我不确定你为什么在四月使用Apr[il1]?[I1l]?。只需使用Apr(il)?Apri?l?

即可

答案 1 :(得分:0)

另一种更通用的解决方案,如果您不必在大文本中搜索,即只搜索示例字符串:

['February', 'March', '2015']

输出:

index.html(Django framework) :
    <head>
    <script>
    app = angular.module('demoapp123', []);
    console.log("inside script....");
    app.controller('DemoCtrl123', ['$scope', function($scope) {
        $scope.num = "222";
        console.log("inside controller....");
    }]);
    </script>
    ..........
    </head>


    <body>
    {% verbatim %}
    <div ng-app="demoapp123">
    <div ng-controller="DemoCtrl123">
    <p>"Angular+controller"</p>
    <p>{{num}}</p>
    </div>
    </div>
    {% endverbatim %}
    </body>