我有以下代码,用于根据发送日期和电子邮件主题条件下载电子邮件附件:
hive.exec.dynamic.partition = true;
hive.exec.dynamic.partition.mode = nonstrict;
例如,是否可以指定仅下载.csv附件的条件?
此外,此代码以前曾在公用文件夹上使用-现在这些文件夹已更新为共享文件夹。自更新以来,我不得不将“最大”值从500增加到2500,以便查找指定的电子邮件。有什么办法可以加快速度吗?
谢谢
答案 0 :(得分:2)
下面是一种指定所需文件类型的方法。
请在“ attachments_of_interest”列表中输入文件结尾。
from datetime import date, timedelta
import os
import win32com.client
path = os.path.expanduser("C:\\Users\\xxxx\\Documents\\Projects\\VBA Projects\\VLOOKUP Automation\\Vlookup File Location")
today = date.today()
outlook = win32com.client.Dispatch("Outlook.Application").GetNamespace("MAPI")
inbox = outlook.Folders("xxx").Folders.Item("Inbox")
messages = inbox.Items
subject = "xxx"
dateHigh = date.today() - timedelta(days=1)
dateLow = date.today() - timedelta(days=-1)
max_n = 2500
attachments_of_interest = ['.csv']
for count, message in enumerate(messages):
if count > max_n:
break
if subject in message.subject and message.senton.date() > dateLow and message.senton.date() < dateHigh:
attachments = message.Attachments
num_attach = len([x for x in attachments])
for x in range(1, num_attach+1):
attachment = attachments.Item(x)
attachment_fname = str(attachment)
file_ending = attachment_fname.split('.')[-1]
if not attachments_of_interest or file_ending in attachments_of_interest:
attachment.SaveASFile(path + '\\' + attachment_fname)
关于加速,您可以使用一个池:
from multiprocessing.pool import ThreadPool as Pool
from datetime import date, timedelta
import os
import win32com.client
path = os.path.expanduser("C:\\Users\\xxxx\\Documents\\Projects\\VBA Projects\\VLOOKUP Automation\\Vlookup File Location")
today = date.today()
outlook = win32com.client.Dispatch("Outlook.Application").GetNamespace("MAPI")
inbox = outlook.Folders("xxx").Folders.Item("Inbox")
messages = inbox.Items
subject = "xxx"
max_n = 2500
attachments_of_interest = ['.csv']
pool_size = 5
# define worker function before a Pool is instantiated
def worker(message):
dateHigh = date.today() - timedelta(days=1)
dateLow = date.today() - timedelta(days=-1)
if subject in message.subject and message.senton.date() > dateLow and message.senton.date() < dateHigh:
attachments = message.Attachments
num_attach = len([x for x in attachments])
for x in range(1, num_attach+1):
attachment = attachments.Item(x)
attachment_fname = str(attachment)
file_ending = attachment_fname.split('.')[-1]
if not attachments_of_interest or file_ending in attachments_of_interest:
attachment.SaveASFile(path + '\\' + attachment_fname)
pool = Pool(pool_size)
for count, message in enumerate(messages):
if count > max_n:
break
pool.apply_async(worker, (message,))
pool.close()
pool.join()
答案 1 :(得分:0)
我认为这是仅下载csv的要求的一部分。 此Outlook组件具有一些可以利用的方法。 而不是消息= inbox.Items 尝试 消息= inbox.Items.GetFirst() 并获得第一条消息,然后使用
消息= inbox.Items.oItems.GetNext() 因此,这样一来,您在内存中始终只有一条消息,并且可以使循环时间更长。
确保您具有Outlook Microsoft Outlook 16.0对象库或高于10的对象库,以便此方法存在。 GetFirst() 我使用的C#代码
Outlook.MailItem oMsg = (Outlook.MailItem)oItems.GetFirst();
//Output some common properties.
Console.WriteLine(oMsg.Subject);
Console.WriteLine(oMsg.SenderName);
Console.WriteLine(oMsg.ReceivedTime);
Console.WriteLine(oMsg.Body);
//Check for attachments.
int AttachCnt = oMsg.Attachments.Count;
Console.WriteLine("Attachments: " + AttachCnt.ToString());
Outlook.MailItem oMsg1 = (Outlook.MailItem)oItems.GetNext();