Question

我是正则表达式和python的新手：我有一个存储在日志文件中的数据，我需要使用正则表达式提取。以下是格式：

#bytes #repetitions  t_min[usec]  t_max[usec]  t_avg[usec]
   0         1000         0.01         0.03         0.02
   4         1000       177.69       177.88       177.79
   8         1000       175.90       176.07       176.01
  16         1000       181.51       181.73       181.60
  32         1000       199.64       199.81       199.72
  64         1000       228.10       228.27       228.19
  28         1000       278.70       278.90       278.75
  256         1000       388.26       388.49       388.39
  512         1000       593.49       593.82       593.63
  1024         1000      1044.27      1044.90      1044.59

Answer 1

您可以使用split或正则表达式获取特定列。对于这种情况，拆分更清洁：

import re
with open("input") as input_file:
    for line in input_file:
        # using split to get the 4th column
        print line.split()[3]
        # using regex to get the 4th column
        print re.match(r'^\s*(?:[^\s]+[\s]+){3}([^\s]+)', line).group(1)

Answer 2

如果你需要使用正则表达式，那么这个脚本可以解决这个问题：

import re

number_pattern = '(\d+(?:\.\d+)?)'
line_pattern = '^\s+%s\s+$' % ('\s+'.join([number_pattern for x in range(5)]))

f = open('data', 'r')
for line in f:
  match = re.match(line_pattern, line)
  if match is not None:
    print match.groups()

Answer 3

你只需要（\ S +）

import re
pattern=re.compile('(\S+)')
f=open('data.txt', 'r')
for l in f.readlines():
    print pattern.findall(l)

您也可以采取其他方式

import re
whitespace=re.compile('\s+')
    f=open('data.txt', 'r')
    for l in f.readlines():
        print whitespace.split(l.strip())

Answer 4

您可以使用genfromtxt中的numpy功能代替：

>>> import numpy as np
>>> a = np.genfromtxt("yourlogfile.dat",skip_header=1)

a将是您所有数据的数组。

正则表达式从python中的表中提取数据

4 个答案: