如何使用python进行文件内存映射插入而不覆盖?

时间:2016-07-29 03:07:39

标签: python memory insert mmap

我一直试图弄清楚如何使用python的mmap模块在特定位置插入数据,而不是覆盖已映射的数据。

我已经想出了如何将数据写入内存映射文件,甚至是我希望它插入的位置。插入的数据是大型1 GB文件中的正则表达式主机名数据,使用附加的域后缀将其转换为小写,然后使用该数据插入映射到内存中的文件中的某个区域。现在,它将其写入正确的位置,但会覆盖我的数据。

我要解析的文件中的数据(真实文件将是800+ megs到1 gig)

<ReportHost name="lnm02a0001"><HostProperties>
<tag name="host-fqdn">localhost</tag>
<ReportHost name="lnm02a0002"><HostProperties>
<tag name="host-fqdn">localhost</tag>
<ReportHost name="lnm02a0003"><HostProperties>
<tag name="host-fqdn">localhost</tag>
<ReportHost name="lnm02a0004"><HostProperties>
<tag name="host-fqdn">localhost</tag>
<ReportHost name="lnm02a0005"><HostProperties>
<tag name="host-fqdn">localhost</tag>
<ReportHost name="lnm02a0006"><HostProperties>
<tag name="host-fqdn">localhost</tag>
<ReportHost name="lnm02a0007"><HostProperties>
<tag name="host-fqdn">localhost</tag>

我正在尝试使用Python代码进行插入:

import sys
import mmap
import re

# Regex to rip out the hostname between the ReportHost and HostProperties tag
myregex = r'<ReportHost\sname="(.*?)"><HostProperties>'
# Used later on to do a mm.find to find the index location in the mapped file on first occurance as I need to rename one of the tag values at a time per hostname.  Its a 1 to 1 match of having a <ReportHost and then a host-fqdn after.
fqdntag = "<tag name=\"host-fqdn\">localhost"

# List used to store the carved out transformed with suffix host values
FQDNH = []

def main(mm):

    # Find and carve out host names
    dm = re.findall(myregex, mm)

    for a in dm:
    print a
        host = str.lower(a+".ad.something.com")
    # Store hosts with suffix into list
        FQDNH.append(host)

#attempt to resize the memory mapped file so there is room for insertion
    newsize = filelenb + hostnum * 1000000
    mm.resize(newsize)

    for host in FQDNH:
        print "New File Size ", filelenb
        print host
    # calculate size of memory mapped file
        size = len(mm)
    # calculate size of hostname
        length = len(host)
    print length
    # find first occurance host-fqdn tag
        fqdnindx = mm.find(fqdntag)
    # calculate the first occurance index value and added hostname length
        awindx = fqdnindx + length
        print "Current Position: ", mm.tell()
    # move data to allow for insertion?  dest, src, count? not really sure how this works for insertion
        mm.move(awindx, fqdnindx, 27)
    # seek index location and move 22 spaces to insert the full qualified domain name (fqdn)
        mm.seek(fqdnindx+22)
    # insert fqdn into memory mapped file
        mm.write(host)
    # flush changes from memory file back to disk
    mm.flush()
    mm.seek(0)
    mm.close()

if __name__ == "__main__":
    f=sys.argv[1]
    fi = open(f, 'r+')
    mm = mmap.mmap(fi.fileno(), 0)
    fi.close()
    main(mm)

我得到的结果,似乎覆盖了数据:

<ReportHost name="lnm02a0001"><HostProperties>
<tag name="host-fqdn">lnm02a0001.ad.something.comlocal="lnm02a0002"><HostProperties>
<tag name="host-fqdn">lnm02a0002.ad.something.comlocal="lnm02a0003"><HostProperties>
<tag name="host-fqdn">lnm02a0003.ad.something.comlocal="lnm02a0004"><HostProperties>
<tag name="host-fqdn">lnm02a0004.ad.something.comlocal="lnm02a0005"><HostProperties

我真正想要的是什么:

<ReportHost name="lnm02a0001"><HostProperties>
<tag name="host-fqdn">lnm02a0001.ad.something.comlocalhost</tag>
<ReportHost name="lnm02a0002"><HostProperties>
<tag name="host-fqdn">lnm02a0002.ad.something.comlocalhost</tag>
<ReportHost name="lnm02a0003"><HostProperties>
<tag name="host-fqdn">lnm02a0003.ad.something.comlocalhost</tag>
<ReportHost name="lnm02a0004"><HostProperties>
<tag name="host-fqdn">lnm02a0004.ad.something.comlocalhost</tag>
<ReportHost name="lnm02a0005"><HostProperties>
<tag name="host-fqdn">lnm02a0005.ad.something.comlocalhost</tag>
<ReportHost name="lnm02a0006"><HostProperties>
<tag name="host-fqdn">lnm02a0006.ad.something.comlocalhost</tag>
<ReportHost name="lnm02a0007"><HostProperties>
<tag name="host-fqdn">lnm02a0007.ad.something.comlocalhost</tag>

0 个答案:

没有答案