如何使用正则表达式进行解析

时间:2014-07-17 20:34:38

标签: python regex

我需要使用正则表达式re.findall()或re.multiline()来解析此库存日志中的任何数字有什么建议吗?这是我到目前为止所拥有的。

以下是库存日志的示例:

Processor               : Intel(R) Xeon(R) CPU E5-2420 0 @ 1.90GHz (24 cores/threads)

Memory                  : 65493MB

Controller Slot         : 0

BIOS                    : 3.0b 05/06/2014 3.2

mpt2sas0: LSISAS2308: FWVersion(16.00.01.00), ChipRevision(0x05), BiosVersion(07.33.00.00)

mpt2sas1: LSISAS2308: FWVersion(17.00.01.00), ChipRevision(0x05), BiosVersion(07.33.00.00)

compute node vpd: NA;NA;NA;

IPMI FW rev             : 2.29

Chassis Type            :  Other

Chassis Part Number     :  CSE-927ETS-R000NDBP

Chassis Serial          :  C92700325A00092

Board Mfg Date          :  Sun Dec 31 19:00:00 1995

Board Mfg               :  Supermicro

Board Product           :  IPMI 2.0

Board Serial            :  OM13BS013020

Board Part Number       :  X9DBS-F(-2U)

Product Manufacturer    :  Supermicro

Product Name            :   IPMI 2.0

Product Part Number     :  SSG-2027B-DE2R24L

Product Version         :

Product Serial          :  S13592923B11809

PCI Riser Card:

81:00.0 Serial Attached SCSI controller: LSI Logic / Symbios Logic SAS2308 PCI-Express 

Fusion-MPT SAS-2 (rev 05)

83:00.0 Ethernet controller: Chelsio Communications Inc T420-BT Unified Wire Ethernet 

Controller

83:00.1 Ethernet controller: Chelsio Communications Inc T420-BT Unified Wire Ethernet 

Controller

83:00.2 Ethernet controller: Chelsio Communications Inc T420-BT Unified Wire Ethernet 

Controller

83:00.3 Ethernet controller: Chelsio Communications Inc T420-BT Unified Wire Ethernet 

Controller

83:00.4 Ethernet controller: Chelsio Communications Inc T420-BT Unified Wire Ethernet 

Controller

83:00.5 SCSI storage controller: Chelsio Communications Inc T420-BT Unified Wire Storage 
Controller
83:00.6 Fibre Channel: Chelsio Communications Inc T420-BT Unified Wire Storage Controller
83:00.7 Ethernet controller: Chelsio Communications Inc Device 0000

-Hardware information

+Ethernet configuration
Chelsio T420-BT Card
version:   2.8.0.0
firmware-version: 1.9.23.0

plxnic0    00:10:b5:87:b0:01
eth3       00:07:43:15:fb:68
eth2       00:07:43:15:fb:60
eth1       00:25:90:8c:3a:23
eth0       00:25:90:8c:3a:22
bmc1       00:25:90:8c:15:2d    10.40.32.36
-Ethernet configuration

+Firmware Versions
, you are running a release image
This sdi release build was done by build on Sun Jul 13 2014 16:51:58
From /slave/jenkins/workspace/sdi_rls/dgcode
With git rev: e7fc81503edb567205e284cadef35f6bc5d0b7e6

   import re

   with warnings.catch_warnings():
   warnings.simplefilter("ignore")
   import sys

   sys.path.append("/home/build/sars")


   def rescanips():


 data = {}

  fileIN = open(sys.argv[1], 'r')

line = fileIN.readline()

for line in file_obj:
    if ':' in line:
     pos = line.index(':')
     data[line[:pos].strip()] = line[pos + 1:].strip()

 for key in data: print key, ':', data[x]

 if key == "Processor":
     if data[x] != "Intel(R) Xeon(R) CPU E5-2420 v2 @ 2.20GHz (24 cores/threads)":
         sys.stderr.write("Should be Intel(R) Xeon(R) CPU E5-2420 v2 @ 2.20GHz (24 cores/threads) but it is" + data[x] "\n")

  if key == "Memory":
      if data[x] != "81877MB":
      sys.stderr.write("Error in memory: should be 81877MB it is currently" + data[x] + "\n")

 if key == "Controller Slot":
     if data[x] != "0" or "1":
        sys.stderr.write("Invalid controller slot either should be 0 or 1 it is" + data[x] + "\n")

 if key == "BIOS":
     if data[x] != "3.0b 5/6/14 3.1":
        sys.stderr.write("The BIOS must be updated to 3.0b 5/6/14 3.1 it is currently" + data[x] + "\n")

 if key == "Canister Firmware":
     if data[x] != "3.5.0.20":
         sys.stderr.write("The Canister Firmware must updated to 3.5.0.20 it is currently" + data[x] + "\n")

f.close()

1 个答案:

答案 0 :(得分:0)

不要使用正则表达式进行解析。除了要求上下文的PCRE版本外,它们还是slow when used like this。您是否考虑过使用Ply?你正在努力推动自己的状态机,以及其中的痛苦。

我有一个sample ply parser你可以用来开始。了解它如何在lexer.py文件中使用简单的正则表达式来定义令牌,这些令牌将被传递到parser.py文件。解析器定义令牌的有效顺序,以及如何处理令牌集。

这允许您在解析器中找到匹配时执行代码,在词法分析器中确定上下文时。例如,您在词法分析器中找到一个数字,并将其传递给解析器。解析器看到数字标记位于处理器标记之后,并将其记录为相关数据。