Question

我正在使用python逐行搜索文本日志文件，我想将一行的某一部分保存为变量。我正在使用正则表达式，但不认为我正确使用它，因为我总是得到None我的变量string_I_want。我在这里查看了其他正则表达式的问题，看到有人将.group()添加到re.search的末尾，但这给了我一个错误。我不是最熟悉的正则表达式，但无法弄清楚我哪里出错了？

示例日志文件：

2016-03-08 11:23:25  test_data:0317: m=string_I_want max_count: 17655, avg_size: 320, avg_rate: 165

我的剧本：

def get_data(log_file):

    #Read file line by line
    with open(log_file) as f:
        f = f.readlines()

        for line in f:
            date = line[0:10]
            time = line[11:19]

            string_I_want=re.search(r'/m=\w*/g',line)

            print date, time, string_I_want

Answer 1

您需要使用全局标记删除/.../分隔符，并使用捕获组：

mObj = re.search(r'm=(\w+)',line)
if mObj:
    string_I_want = mObj.group(1)

请参阅此regex demo和Python demo：

import re
p = r'm=(\w+)'              # Init the regex with a raw string literal (so, no need to use \\w, just \w is enough)
s = "2016-03-08 11:23:25  test_data:0317: m=string_I_want max_count: 17655, avg_size: 320, avg_rate: 165"
mObj = re.search(p, s)      # Execute a regex-based search
if mObj:                    # Check if we got a match
    print(mObj.group(1))    # DEMO: Print the Group 1 value

模式详情：

m= - 匹配m=文字字符序列（如果必须匹配整个字，请在\b之前添加空格）
(\w+) - 第1组捕获1个以上的字母数字或下划线字符。我们可以使用.group(1)方法引用此值。

Answer 2

执行：

(?<=\sm=)\S+

示例：

In [135]: s = '2016-03-08 11:23:25 test_data:0317: m=string_I_want max_count: 17655, avg_size: 320, avg_rate: 165' In [136]: re.search(r'(?<=\sm=)\S+', s).group() Out[136]: 'string_I_want'

Answer 3

以下是您的需求：

import re
def get_data(logfile):
    f = open(logfile,"r")
    for line in f.readlines():
        s_i_w = re.search( r'(?<=\sm=)\S+', line).group()
        if s_i_w:
            print s_i_w
    f.close()

Python：使用Regex在文件的一行中获取特定文本

3 个答案: