尝试使用RegEx了解更简单的方法

时间:2013-05-22 14:56:14

标签: python regex string shell applescript

为了给你一个想法,我正在尝试用这些信息抓住任何字符串。

IP Address for: John Doe on 05/20/13

我基本上需要找到那种格式的所有字符串..

我正在使用date '+%m/%d/%y'来获取今天的日期。

基本上我需要:

"'IP Address for: '[A-Za-z]'on 'date ''+%m/%d/%y''"

编辑:

示例字符串

IP Address for: John Doe on 05/20/13
another random string
IP Address for: Jane Doe on 05/20/13
IP Address for: John Appleseed on 05/20/13
random string
IP Address for: Mr. Beans on 05/14/13
IP Address for: Steve Jobs on 05/03/13
IP Address for: Bill Gates on 05/19/13

我需要归还的是这个。它符合“IP地址:”+“on”+“date

的标准
IP Address for: John Doe on 05/20/13
IP Address for: Jane Doe on 05/20/13
IP Address for: John Appleseed on 05/20/13

4 个答案:

答案 0 :(得分:1)

我为你写了一个很好的方法。

import re

s = '''
IP Address for: John Doe on 05/20/13
another random string
IP Address for: Jane Doe on 05/20/13
IP Address for: John Appleseed on 05/20/13
random string
IP Address for: Mr. Beans on 05/14/13
IP Address for: Steve Jobs on 05/03/13
IP Address for: Bill Gates on 05/19/13
'''

regex = re.compile(r'IP Address for: (.+) on (\d\d/\d\d/\d\d)')

def method(data, matcher, name=None, date=None):
    '''
    Takes data and runs the matcher on it to find name and date.
    ARGS:
    data    := the data (string, or fileobject)
    matcher := the regex object to match with.
    name    := specify only specific name to find (optional)
    date    := specify only specific date to find (optional)
    '''
    if isinstance(data, str):
        content = data.split('\n')
    elif isinstance(data, file):
        content = data
    for line in content:
        line = line.strip()
        ms = matcher.match(line)
        if not ms:
            continue
        if name and ms.group(1) != name:
            continue
        if date and ms.group(2) != date:
            continue
        yield ms.groups()

使用它:

# no options
for result in method(s, regex):
    print result   

('John Doe', '05/20/13')
('Jane Doe', '05/20/13')
('John Appleseed', '05/20/13')
('Mr. Beans', '05/14/13')
('Steve Jobs', '05/03/13')
('Bill Gates', '05/19/13')

# with a name
for result in method(s, regex, name='John Doe'):
    print result

('John Doe', '05/20/13')

# with a date
for result in method(s, regex, date='05/20/13'):
    print result 

('John Doe', '05/20/13')
('Jane Doe', '05/20/13')
('John Appleseed', '05/20/13')

答案 1 :(得分:1)

对于AppleScript标记:

set myText to "Starting Text
IP Address for: Mr. Beans on 05/14/13
Leading Text IP Address for: Steve Jobs on 05/03/13 Trailing Text
Middle Text
IP Address for: Bill Gates on 05/19/13
Ending Text
"

set variableName to do shell script "grep -Eo 'IP Address for:.*on ([[:digit:]]{2}/){2}[[:digit:]]{2}' <<< " & quoted form of myText

答案 2 :(得分:0)

如果格式始终锁定,则可以在名称上搜索更广。如果你不关心验证,你也可以在日期匹配上非常一般。

当我们编写正则表达式时,除非我们将它与代码示例一起显示,否则我们永远不会包含字符串引号。

匹配字符串的示例

IP Address for: John Doe on 05/20/13

可以是以下正则表达式:

1. 
IP Address for: .+ on (\d\d/\d\d/\d\d)

这将获得组1中的日期,但它将允许任何字符用于名称,并允许任何数字用于日期。如果您希望限制允许使用的字符,可以将其替换为字符组,就像您在示例中所做的那样:

[A-Za-z]+

该字符组的问题在于您无法匹配空格,并且它不适用于John Doe。为了匹配名称之间的空格,您需要将其包含在字符组

2.
[A-Za-z\s]+

或匹配多个单词。

3.
([A-Za-z]+\s?)+

后者的优势在于,它不会识别没有名称的情况,或者名称不包含任何a-z字符。

几个例子:

IP Address for: .$%1 on 05/20/13       matches 1.
IP Address for:   on 05/20/13          matches 1. and 2.
IP Address for: John Doe on 05/20/13   matches 1., 2. and 3.

因此,根据输入的外观,您可能希望避免使用.*的正则表达式。人们一直使用它们,它通常工作正常,但我尝试永远不要使用点,除非我找不到任何其他方式。

答案 3 :(得分:0)

鉴于您提到date,我假设您只想要与今天的日期匹配的行,无论您进行检查的日期。

$ grep "IP Address for: .* on $(date +'%m/%d/%Y')" file.txt