将outlook.msg正文提取到文本如下。我想在正在读取的文本文件行中搜索模式但是字符前面带有'\ x00'

时间:2015-02-18 18:46:12

标签: win32com mapi python-unicode unicode-escapes

将outlook.msg正文提取到文本文件

def getEmailBodyFromMsg():

mapi.MAPIInitialize ((mapi.MAPI_INIT_VERSION, 0)) 
storage_flags = win32com.storagecon.STGM_DIRECT | win32com.storagecon.STGM_READ | win32com.storagecon.STGM_SHARE_EXCLUSIVE 
filepathList = glob.glob('*.msg')

for filepath in filepathList :

    txtFilepath = os.path.splitext(ntpath.basename(filepath))[0]
    resultFile = txtFilepath + datetime.now().strftime('%Y-%m-%d %H_%M_%S')+".txt"

    #get body of email and save as txt
    storage = pythoncom.StgOpenStorage (filepath, None, storage_flags, None, 0) 
    mapi_session = mapi.OpenIMsgSession () 
    message = mapi.OpenIMsgOnIStg (mapi_session, None, storage, None, 0, mapi.MAPI_UNICODE)

    #write to txt file
    CHUNK_SIZE = 10000 
    stream = message.OpenProperty (win32com.mapi.mapitags.PR_BODY, pythoncom.IID_IStream, 0, 0) 
    text = u"" 
    while True: 
        bytes = stream.read (CHUNK_SIZE) 
        if bytes: 
            text += bytes
        else: 
            break 
    with codecs.open(resultFile, mode='w', encoding='utf-8') as a_file:
        a_file.write(text)

打开上面写入的文件以搜索文本行:

    with  codecs.open(absFilepath, 'rb', encoding='utf-8') as inFile :
        for index, line in enumerate(inFile) :
            mymatch =  re.search(csResultEmailPattern, line, re.UNICODE)

#line ='R \ x00e \ x00s \ x00u \ x00l \ x00t \ x00s \ x00 \ x00f \ x00r \ x00o \ x00m \ x00 \ n'    #OR line = u'R \ x00e \ x00s \ x00u \ x00l \ x00t \ x00s \ x00 \ x00f \ x00r \ x00o \ x00m \ x00 \ r'

我想知道它们是否是一种有效的方法来指定像resultEmailPattern = ur'Results这样的正则表达式,它与上面的'Rx00e..line相匹配或更好的方式来编码txt文件

0 个答案:

没有答案