试图从文本中提取值 -

时间:2016-08-20 04:03:31

标签: regex python-2.7 regex-greedy regex-group

我只是在 会计时才从下面的文字中获取 acc accel rx 值是真的。

splitstackshape

下面的python代码使用以下正则表达式获取记帐 accel

library(splitstackshape)
cSplit(facs, "title", " - ")

输出:

dataRx:       21916               drx:      1743625
  ota:      191791                orx:      74164489
  dataDropped:  14                    dropped:1134
  id: 65535 waitress BE     nginxid: 0   kbps:  0.000
        accounting: false
        drop : 1
        rx   :  48392          bytes: 483920
  id: 65533 waitress BE     nginxid: 1   kbps:  0.000
        accounting: false
        drop : 4
        rx   :  122914         bytes: 70081939
  id: 4232  nginx BE     nginxid: 3   kbps:  0.000
        accounting: false
        drop : 0
        rx   :  3084           bytes: 94357
  id: 10482 server   BE     nginxid: 4   kbps:  0.000
        accounting: false
        drop : 0
        rx   :  15             bytes: 2477
  id: 20344 serve  BE     nginxid: 10  kbps:  62914.560
        accounting: true
        drop : 2
        rx   :  2217           bytes: 309637
        accel : 482            bytes: 264318
        acc :349               bytes: 225181

但是,预期的输出应该是

accounting:\s*((?P<accounting>\S*)[\S\s]*?accel:[\S\s]*?bytes:\s*(?P<accel>\S*)[\S\s]*?)


for match in re.finditer(re_exp, text):
    group = match.groupdict()
    print group

需要有关正则表达式的帮助。任何帮助将不胜感激。 此外,是否有一种正则表达式方法可以对 id 下的所有数据字段进行分组?

由于

2 个答案:

答案 0 :(得分:1)

试试这个

content = open("acc.txt",'r')
ar = content.read() 
import re
getdata = re.findall(r"accounting: (true).+?accel.+?bytes:\s(\d+)",ar,re.S)
print getdata

尝试将与其ID相对应的所有数据作为群组的跟随

content = open("acc.txt",'r')
ar = content.readlines() 
arv = []
flag = 0
m = ""
for j in ar:

    if("id:"in j):
        arv.append(m)
        m = ""
        flag = 1

    if (flag == 1):
        m+=j

for j in (arv):             
    print j

答案 1 :(得分:0)

逐行迭代:

import re

rx = ""
accel = ""
acc = ""

lines = text.split("\n")

for i in range(len(lines)):
    line = lines[i]
    if line == "accounting: true":
        match = re.search(r"rx\s*:\s+(\d*)", lines[i+2])
        if match:
            rx = match.group(1)
        match = re.search(r"accel\s*:\s+(\d*)", lines[i+3])
        if match:
            accel = match.group(1)
        match = re.search(r"acc\s*:\s*(\d*)", lines[i+4])
        if match:
            acc = match.group(1)
        print("'accounting': true, 'rx': {}, 'accel': {}, 'acc': {}".format(rx, accel, acc))