在小型python脚本上处理大量数据

时间:2013-07-18 23:40:17

标签: python

你好我有一个小的python脚本,它的工作是读取TXT文件中的数据并将其排序为特定的删除副本并删除无感知数据并将其放回另一个TXT文件中这种格式MAC IP号码设备

import re
f = open('frame.txt', 'r')
d = open('Result1.txt', 'w')
mac=""
ip=""
phoneName=""
phoneTel=""
lmac=""
lip=""
lphoneName=""
lphoneTel=""
lines=f.readlines()
s=0
p=0
for line in lines:
    matchObj = re.search( '(?<=Src: )[0-9a-z]{2}:[0-9a-z]{2}:[0-9a-z]{2}:[0-9a-z]{2}:[0-9a-z]{2}:[0-9a-z]{2}', line, re.M|re.I)
    if(matchObj):
            mac=matchObj.group(0)+"\t"
    matchObj = re.search( '(?<=Src: )([0-9]+)(?:\.[0-9]+){3}', line, re.M|re.I)
    if(matchObj):
            ip=matchObj.group(0)+"\t"
    if(s==1):
        s=0
        matchObj = re.search( '(?<=Value: )\d+',line,re.M|re.I)
        if(matchObj):
            phoneName=matchObj.group(0)+"\t"
    if(p==1):
        p=0
        matchObj = re.search( '(?<=Value: ).+',line,re.M|re.I)
        if(matchObj):
            phoneTel=matchObj.group(0)+"\t"  
    matchObj = re.search( '(?<=Key: user \(218)', line, re.M|re.I)
    if(matchObj):
        s=1
    matchObj = re.search( '(?<=Key: resource \(165)', line, re.M|re.I)
    if(matchObj):
        p=1
    if(mac!="" and ip!="" and phoneName!="" and phoneTel!="" and mac!=lmac and ip!=lip and phoneName!=lphoneName and phoneTel!=lphoneTel):        
        d.write(mac+" " +ip+" "+ phoneName+" "+ phoneTel)
        lmac=mac
        lip=ip
        lphoneName=phoneName
        lphoneTel=phoneTel
        d.write("\n")
    matchObj = re.search( 'Frame \d+', line, re.M|re.I)
    if(matchObj):              
        mac=""
        ip=""
        phoneName=""
        phoneTel=""        
d.close()
f.close()

这里的代码问题是代码需要处理可能达到100GB的大量数据,当我这样做时,程序冻结并自行解决任何想法如何解决这个问题非常感谢你!

2 个答案:

答案 0 :(得分:5)

您在开始时读取整个文件 - 如果文件很大,将其加载到内存中将是一个问题。尝试迭代这些行。一般来说,你喜欢

with open(filename) as f:
    for line in f:
        # This will iterate over the lines in the file rather than read them all at once

所以对你来说,把你的循环结构改为:

for line in f:

并删除:

lines=f.readlines()

答案 1 :(得分:1)

使用readline()而不是readlines()

readlines()立即将整个文件读入内存。