我一直试图弄清楚如何使用python的mmap模块在特定位置插入数据,而不是覆盖已映射的数据。
我已经想出了如何将数据写入内存映射文件,甚至是我希望它插入的位置。插入的数据是大型1 GB文件中的正则表达式主机名数据,使用附加的域后缀将其转换为小写,然后使用该数据插入映射到内存中的文件中的某个区域。现在,它将其写入正确的位置,但会覆盖我的数据。
我要解析的文件中的数据(真实文件将是800+ megs到1 gig)
<ReportHost name="lnm02a0001"><HostProperties>
<tag name="host-fqdn">localhost</tag>
<ReportHost name="lnm02a0002"><HostProperties>
<tag name="host-fqdn">localhost</tag>
<ReportHost name="lnm02a0003"><HostProperties>
<tag name="host-fqdn">localhost</tag>
<ReportHost name="lnm02a0004"><HostProperties>
<tag name="host-fqdn">localhost</tag>
<ReportHost name="lnm02a0005"><HostProperties>
<tag name="host-fqdn">localhost</tag>
<ReportHost name="lnm02a0006"><HostProperties>
<tag name="host-fqdn">localhost</tag>
<ReportHost name="lnm02a0007"><HostProperties>
<tag name="host-fqdn">localhost</tag>
我正在尝试使用Python代码进行插入:
import sys
import mmap
import re
# Regex to rip out the hostname between the ReportHost and HostProperties tag
myregex = r'<ReportHost\sname="(.*?)"><HostProperties>'
# Used later on to do a mm.find to find the index location in the mapped file on first occurance as I need to rename one of the tag values at a time per hostname. Its a 1 to 1 match of having a <ReportHost and then a host-fqdn after.
fqdntag = "<tag name=\"host-fqdn\">localhost"
# List used to store the carved out transformed with suffix host values
FQDNH = []
def main(mm):
# Find and carve out host names
dm = re.findall(myregex, mm)
for a in dm:
print a
host = str.lower(a+".ad.something.com")
# Store hosts with suffix into list
FQDNH.append(host)
#attempt to resize the memory mapped file so there is room for insertion
newsize = filelenb + hostnum * 1000000
mm.resize(newsize)
for host in FQDNH:
print "New File Size ", filelenb
print host
# calculate size of memory mapped file
size = len(mm)
# calculate size of hostname
length = len(host)
print length
# find first occurance host-fqdn tag
fqdnindx = mm.find(fqdntag)
# calculate the first occurance index value and added hostname length
awindx = fqdnindx + length
print "Current Position: ", mm.tell()
# move data to allow for insertion? dest, src, count? not really sure how this works for insertion
mm.move(awindx, fqdnindx, 27)
# seek index location and move 22 spaces to insert the full qualified domain name (fqdn)
mm.seek(fqdnindx+22)
# insert fqdn into memory mapped file
mm.write(host)
# flush changes from memory file back to disk
mm.flush()
mm.seek(0)
mm.close()
if __name__ == "__main__":
f=sys.argv[1]
fi = open(f, 'r+')
mm = mmap.mmap(fi.fileno(), 0)
fi.close()
main(mm)
我得到的结果,似乎覆盖了数据:
<ReportHost name="lnm02a0001"><HostProperties>
<tag name="host-fqdn">lnm02a0001.ad.something.comlocal="lnm02a0002"><HostProperties>
<tag name="host-fqdn">lnm02a0002.ad.something.comlocal="lnm02a0003"><HostProperties>
<tag name="host-fqdn">lnm02a0003.ad.something.comlocal="lnm02a0004"><HostProperties>
<tag name="host-fqdn">lnm02a0004.ad.something.comlocal="lnm02a0005"><HostProperties
我真正想要的是什么:
<ReportHost name="lnm02a0001"><HostProperties>
<tag name="host-fqdn">lnm02a0001.ad.something.comlocalhost</tag>
<ReportHost name="lnm02a0002"><HostProperties>
<tag name="host-fqdn">lnm02a0002.ad.something.comlocalhost</tag>
<ReportHost name="lnm02a0003"><HostProperties>
<tag name="host-fqdn">lnm02a0003.ad.something.comlocalhost</tag>
<ReportHost name="lnm02a0004"><HostProperties>
<tag name="host-fqdn">lnm02a0004.ad.something.comlocalhost</tag>
<ReportHost name="lnm02a0005"><HostProperties>
<tag name="host-fqdn">lnm02a0005.ad.something.comlocalhost</tag>
<ReportHost name="lnm02a0006"><HostProperties>
<tag name="host-fqdn">lnm02a0006.ad.something.comlocalhost</tag>
<ReportHost name="lnm02a0007"><HostProperties>
<tag name="host-fqdn">lnm02a0007.ad.something.comlocalhost</tag>