从python中的文件中提取内容

时间:2012-09-19 14:43:20

标签: python regex

我有一个包含以下内容的文件......

2012-09-19_12:26:01
UPTIME report
 12:26:12 up 2 days, 22:53,  5 users,  load average: 0.13, 0.10, 0.03

FREE report
             total       used       free     shared    buffers     cached
Mem:       1914244     366692    1547552          0      85136     160928
-/+ buffers/cache:     120628    1793616
Swap:      4192956      35928    4157028

VMSTAT report
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----    
r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 0  0  35928 1547552  85136 160928    3    4     4    79   16   40  1  6 92  0          0       

IOSTAT report
Linux 2.6.32-279.el6.x86_64 (progserver)        09/19/12        _x86_64_        (4 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           1.17    0.00    6.22    0.10    0.07   92.44

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
xvdc              1.92        23.72        30.26    6052098    7720864
xvda             11.25         6.00       600.34    1530740  153196208



Top 10 cpu using processes
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root      9422  1.0  0.0  13492  1060 ?        R    12:26   0:00 ps auxww --sort=-pcpu
root       6520  0.2  0.2 143800  4308 pts/3    S+   12:25   0:00 vim LogicApp/Logic.py
root     28406  0.2  0.0  15024  1272 pts/4    S+   12:23   0:00 top
root         1  0.0  0.0  19228   292 ?        Ss   Sep16   0:01 /sbin/init
root         2  0.0  0.0      0     0 ?        S    Sep16   0:00 [kthreadd]
root         3  0.0  0.0      0     0 ?        S    Sep16   1:06 [migration/0]
root         4  0.0  0.0      0     0 ?        S    Sep16   0:02 [ksoftirqd/0]
root         5  0.0  0.0      0     0 ?        S    Sep16   0:00 [migration/0]

Top 10 memory using processes
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root      6520  0.2  0.2 143800  4308 pts/3    S+   12:25   0:00 vim LogicApp/Logic.py
postfix  12145  0.0  0.1  78752  3204 ?        S    11:36   0:00 pickup -l -t fifo -u
root       928  0.0  0.1 251576  2956 ?        Sl   Sep16   0:08 /sbin/rsyslogd -i     /var/run/syslogd.pid -c 5
root      6521  0.0  0.0 140096  1336 ?        S    12:26   0:00 CROND
root     28406  0.2  0.0  15024  1272 pts/4    S+   12:23   0:00 top
root     31822  0.0  0.0 108428  1084 pts/6    Ss+  Sep18   0:00 -bash
root      9424  1.0  0.0  13492  1064 ?        R    12:26   0:00 ps auxww --sort=-rss
root      4936  0.0  0.0 108428   936 pts/3    Ss   Sep18   0:00 -bash

我想要做的是将此文件的一部分打印到屏幕上。因此,例如我想获取文件的以下部分并将其打印到python中的屏幕...

Top 10 cpu using processes
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root      9422  1.0  0.0  13492  1060 ?        R    12:26   0:00 ps auxww --sort=-pcpu
root       6520  0.2  0.2 143800  4308 pts/3    S+   12:25   0:00 vim LogicApp/Logic.py
root     28406  0.2  0.0  15024  1272 pts/4    S+   12:23   0:00 top
root         1  0.0  0.0  19228   292 ?        Ss   Sep16   0:01 /sbin/init
root         2  0.0  0.0      0     0 ?        S    Sep16   0:00 [kthreadd]
root         3  0.0  0.0      0     0 ?        S    Sep16   1:06 [migration/0]
root         4  0.0  0.0      0     0 ?        S    Sep16   0:02 [ksoftirqd/0]
root         5  0.0  0.0      0     0 ?        S    Sep16   0:00 [migration/0]

我相信这将是正则表达式。我想将说cpu的行与表示内存的行匹配。

我是如何实现这一目标的?

1 个答案:

答案 0 :(得分:4)

使用正则表达式是过度的;你基本上想要在两行之间提取内容。以下是如何执行此操作的示例:

cpu = []
with open('filename') as f:
    in_cpu = False
    for line in f:
        line = line.strip()
        if line == 'Top 10 cpu using processes':
            in_cpu = True
        elif line == 'Top 10 memory using processes':
            break
        elif in_cpu and line:
            cpu.append(line)

如果将整个文件读入内存不是问题,你可以做得更好一点:

data = map(str.rstrip, open('filename'))
start_index = data.index('Top 10 cpu using processes')
end_index = data.index('Top 10 memory using processes')
cpu = data[start_index+1:end_index-1]

请注意.index() will raise an exception in case no element is found! You need to use a try..except`阻止此操作。

起始索引的+1排除“前10名......”行; -1排除结束标记前的空行。