提取字符串并使用Python中的regex分配给变量

时间:2014-11-08 12:26:12

标签: python regex python-3.x

我正在尝试编写一个脚本,该脚本将通过文本文件检查特定内容并分配给变量。

例如:

文字文件内容:

eth0      Link encap:Ethernet  HWaddr 08:ee:27:ff:b3:d7  
          inet addr:10.0.2.45  Bcast:10.3.2.255  Mask:255.255.255.0
          inet6 addr: fe80::a00:27ff:fe00:b3d7/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:16178 errors:0 dropped:0 overruns:0 frame:0
          TX packets:8559 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:14045795 (14.0 MB)  TX bytes:1355632 (1.3 MB)

lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:666 errors:0 dropped:0 overruns:0 frame:0
          TX packets:666 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:72748 (72.7 KB)  TX bytes:72748 (72.7 KB)

我想在接口eth0上检查'RX packets'的值,并为变量赋值'16178'。我需要能够从这个特定的块'eth0'中提取这个值。

请告知从哪里开始?

谢谢。

3 个答案:

答案 0 :(得分:1)

如图所示,可以使用Regex轻松完成; eth0.*?指定应提取与 eth0 相关的数据包,RX packets:指定 RX数据包后的数字:需要提取并(\d)组提取的数字。

>>> import re
>>> a="""eth0      Link encap:Ethernet  HWaddr 08:ee:27:ff:b3:d7  
...           inet addr:10.0.2.45  Bcast:10.3.2.255  Mask:255.255.255.0
...           inet6 addr: fe80::a00:27ff:fe00:b3d7/64 Scope:Link
...           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
...           RX packets:16178 errors:0 dropped:0 overruns:0 frame:0
...           TX packets:8559 errors:0 dropped:0 overruns:0 carrier:0
...           collisions:0 txqueuelen:1000 
...           RX bytes:14045795 (14.0 MB)  TX bytes:1355632 (1.3 MB)
... 
... lo        Link encap:Local Loopback  
...           inet addr:127.0.0.1  Mask:255.0.0.0
...           inet6 addr: ::1/128 Scope:Host
...           UP LOOPBACK RUNNING  MTU:65536  Metric:1
...           RX packets:666 errors:0 dropped:0 overruns:0 frame:0
...           TX packets:666 errors:0 dropped:0 overruns:0 carrier:0
...           collisions:0 txqueuelen:0 
...           RX bytes:72748 (72.7 KB)  TX bytes:72748 (72.7 KB)"""
>>> re.search(r'eth0.*?RX packets:(\d+)',a,re.DOTALL).group(1)
'16178'

答案 1 :(得分:0)

您可以使用正则表达式来提取值。

尝试一种模式:

m = re.match("\W*RX packets[^:]*:(\d+)", line)

在正则表达式\d表示数字,+表示一个或多个。你想要匹配'文本中的一个或多个数字。括号意味着捕获找到的数字,这个数字应该在特定文本RX packets:之后找到。

您可以在official doc.

中找到有关正则表达式的更多详细信息

您的代码如下:

data= """
eth0      Link encap:Ethernet  HWaddr 08:ee:27:ff:b3:d7  
          inet addr:10.0.2.45  Bcast:10.3.2.255  Mask:255.255.255.0
          inet6 addr: fe80::a00:27ff:fe00:b3d7/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:16178 errors:0 dropped:0 overruns:0 frame:0
          TX packets:8559 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:14045795 (14.0 MB)  TX bytes:1355632 (1.3 MB)

lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:666 errors:0 dropped:0 overruns:0 frame:0
          TX packets:666 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:72748 (72.7 KB)  TX bytes:72748 (72.7 KB)"""

import re

def findSeq(block,data):
    isInRightBlock= False
    for line in data.splitlines():
        if block in line:
            isInRightBlock= True
        m = re.match("\W*RX packets[^:]*:(\d+)", line)
        if m and isInRightBlock:
            isInRightBlock= False
            return m.group(1)


res= findSeq("eth0",data)
print res #Your Value

输出:

16178

<强> Banchemark

from datetime import datetime
start_time_1 = datetime.now()
res= findSeq("eth0",data)
print('Duration: {}'.format(datetime.now() - start_time_1))

from datetime import datetime
start_time_2 = datetime.now()
re.search(r'eth0.*?RX packets:(\d+)',data,re.DOTALL).group(1)
print('Duration: {}'.format(datetime.now() - start_time_2))

输出

Duration: 0:00:00.000547
Duration: 0:00:00.000344

NT:您可以优化检查正确区块的方式。

答案 2 :(得分:0)

>>> re.findall('eth0.*?RX packets:(\d+)',x,re.DOTALL)
['16178']