Python:解析各种不同的文本文件电子邮件

时间:2016-12-08 15:38:50

标签: python regex email text-files

这些是来自3个不同电子邮件的3个片段:

1)
Subject: NEFS 11 and 12 fish for lease

Greetings,

NEFS 11 has the following fish for lease:
up to 4,000 lbs live wt GOM cod @ 1.40 lbs
NEFS 12 has the following fish for lease:
2,000 lbs American plaice @ .45 lbs

Please let me know if you're interested in either,


2)
Subject: NEFS 11 fish for lease

2,000 lbs Grey sole @ 1.20 or best offer
1,000 lbs dabs @ .55 or best offer

thanks,


3)
Subject: NEFS 11 fish for lease

-GOM Cod up to 5,000 lbs (live wt) @ 1.40 lbs
-American Plaice 2,000 lbs      .60 lbs or best offer

我的问题是:解析扇区(NEFS 11,12),物种(GOM鳕鱼,灰色鞋底),磅(4,000磅,2,000磅)和价格的最有效方法是什么? (1.40 / lb,0.55 / lb)来自这些电子邮件的信息?

我的第一个想法是使用RegEx。但我不确定这是最好的方法,因为我的代码目前捕获了太多信息;例如,当我去抓取重量数据时,我最终也会抓住价格数据,因为两者都与#34; lbs"相邻。当我尝试捕获扇区数据时,我会捕获整个主题行。

以下是我的一段代码,用于解析电子邮件中的Species数据:

for filename in os.listdir(path):
file_path = os.path.join(path, filename)
if os.path.isfile(file_path):
    with open(file_path, 'r') as f:
        sector_result = []
        pattern = re.compile("Available Quota | CC Yellowtail Flounder | GOM Yellowtail Flounder | GB Cod East | GB Cod West | GB Haddock East | GB Haddock West | GB Winter Flounder | GB Yellowtail Flounder | GOM Cod | GOM Haddock | GOM Winter Flounder | Plaice | Pollock | Redfish | SNE Winter Flounder | ME Winter Flounder | SNE Yellowtail Flounder | ME Yellowtail Flounder | White Hake | Witch Flounder", re.IGNORECASE)
        for linenum, line in enumerate(f):
            if pattern.search(line) != None:
                sector_result.append((linenum, line.rstrip('\n')))
                for linenum, line in sector_result:
                    print ("Fish Species:", line)

我搜索可以在电子邮件中找到的所有可能的物种,理想情况下(例如3)我会生产:"鱼种:GOM Cod,American Plaice"但是产生的是Fish Species: -American Plaice 2,000 lbs .60 lbs or best offer

我不是使用RegEx的专家,所以我很感激帮助修改我的RegEx代码或者我应该用来解析这些以及更多电子邮件的其他方法的建议。谢谢。

其他电子邮件:

NEFS 5 has the following fish available for lease/trade:

GB EAST cod: 954 lbs @ $0.83
GB EAST cod: 1,046 lbs to trade for 1,830 lbs GB WEST cod
GB blackback: 30,000 lbs @ $0.07
GOM blackback: 800 lbs @ $0.03
white hake: 6,322 lbs @ $0.13
pollock: 22,000 lbs @ $0.015
redfish: 14,000 lbs @ $0.015
GB yt: 1,873 lbs @ $1.13
GB yt: 5,127 lbs to trade for 10,254 lbs SNE yt

1 个答案:

答案 0 :(得分:2)

获得不同的鱼类:

with open(file_path, 'r') as f:
    pattern = re.compile(r"Available Quota|CC Yellowtail Flounder|GOM Yellowtail Flounder|GB Cod East|GB Cod West|GB Haddock East|GB Haddock West|GB Winter Flounder|GB Yellowtail Flounder|GOM Cod|GOM Haddock|GOM Winter Flounder|Plaice|Pollock|Redfish|SNE Winter Flounder|ME Winter Flounder|SNE Yellowtail Flounder|ME Yellowtail Flounder|White Hake|Witch Flounder", re.IGNORECASE)
    email = f.read()
    fish_types = pattern.findall(email)
    if fish_types:
        print("Fish Species:", " ".join(fish_types))