这些是来自3个不同电子邮件的3个片段:
1)
Subject: NEFS 11 and 12 fish for lease
Greetings,
NEFS 11 has the following fish for lease:
up to 4,000 lbs live wt GOM cod @ 1.40 lbs
NEFS 12 has the following fish for lease:
2,000 lbs American plaice @ .45 lbs
Please let me know if you're interested in either,
2)
Subject: NEFS 11 fish for lease
2,000 lbs Grey sole @ 1.20 or best offer
1,000 lbs dabs @ .55 or best offer
thanks,
3)
Subject: NEFS 11 fish for lease
-GOM Cod up to 5,000 lbs (live wt) @ 1.40 lbs
-American Plaice 2,000 lbs .60 lbs or best offer
我的问题是:解析扇区(NEFS 11,12),物种(GOM鳕鱼,灰色鞋底),磅(4,000磅,2,000磅)和价格的最有效方法是什么? (1.40 / lb,0.55 / lb)来自这些电子邮件的信息?
我的第一个想法是使用RegEx。但我不确定这是最好的方法,因为我的代码目前捕获了太多信息;例如,当我去抓取重量数据时,我最终也会抓住价格数据,因为两者都与#34; lbs"相邻。当我尝试捕获扇区数据时,我会捕获整个主题行。
以下是我的一段代码,用于解析电子邮件中的Species数据:
for filename in os.listdir(path):
file_path = os.path.join(path, filename)
if os.path.isfile(file_path):
with open(file_path, 'r') as f:
sector_result = []
pattern = re.compile("Available Quota | CC Yellowtail Flounder | GOM Yellowtail Flounder | GB Cod East | GB Cod West | GB Haddock East | GB Haddock West | GB Winter Flounder | GB Yellowtail Flounder | GOM Cod | GOM Haddock | GOM Winter Flounder | Plaice | Pollock | Redfish | SNE Winter Flounder | ME Winter Flounder | SNE Yellowtail Flounder | ME Yellowtail Flounder | White Hake | Witch Flounder", re.IGNORECASE)
for linenum, line in enumerate(f):
if pattern.search(line) != None:
sector_result.append((linenum, line.rstrip('\n')))
for linenum, line in sector_result:
print ("Fish Species:", line)
我搜索可以在电子邮件中找到的所有可能的物种,理想情况下(例如3)我会生产:"鱼种:GOM Cod,American Plaice"但是产生的是Fish Species: -American Plaice 2,000 lbs .60 lbs or best offer
。
我不是使用RegEx的专家,所以我很感激帮助修改我的RegEx代码或者我应该用来解析这些以及更多电子邮件的其他方法的建议。谢谢。
其他电子邮件:
NEFS 5 has the following fish available for lease/trade:
GB EAST cod: 954 lbs @ $0.83
GB EAST cod: 1,046 lbs to trade for 1,830 lbs GB WEST cod
GB blackback: 30,000 lbs @ $0.07
GOM blackback: 800 lbs @ $0.03
white hake: 6,322 lbs @ $0.13
pollock: 22,000 lbs @ $0.015
redfish: 14,000 lbs @ $0.015
GB yt: 1,873 lbs @ $1.13
GB yt: 5,127 lbs to trade for 10,254 lbs SNE yt
答案 0 :(得分:2)
获得不同的鱼类:
with open(file_path, 'r') as f:
pattern = re.compile(r"Available Quota|CC Yellowtail Flounder|GOM Yellowtail Flounder|GB Cod East|GB Cod West|GB Haddock East|GB Haddock West|GB Winter Flounder|GB Yellowtail Flounder|GOM Cod|GOM Haddock|GOM Winter Flounder|Plaice|Pollock|Redfish|SNE Winter Flounder|ME Winter Flounder|SNE Yellowtail Flounder|ME Yellowtail Flounder|White Hake|Witch Flounder", re.IGNORECASE)
email = f.read()
fish_types = pattern.findall(email)
if fish_types:
print("Fish Species:", " ".join(fish_types))