解析来自文本文件的电子邮件信息

时间:2019-04-22 18:30:37

标签: python parsing

我已更新此代码。现在,此代码将从文本文件中的电子邮件地址获取名字,姓氏和电子邮件。我只需要添加一个计数器即可计算唯一域名的数量! 例如:

taco.salad@tacos.com
burrito.fest@burrito.com
asmith@tacos.com

将返回此:

taco.salad@tacos.com
first name: taco
last name: salad
domain: tacos.com

burrito.fest@burrito.com
first name: burrito
last name: fest
domain: burrito.com

asmith@tacos.com
first name: a
last name: smith
domain: tacos.com

number of emails found:
3
number of unique domains found:
2

这是我到目前为止所拥有的:


import re

count = 0
fname = input('Enter a filename: ')

afile = open((fname), "rt")
for email in afile:
  if re.match(r'[\w\.-]+@[\w\.-]+',  email):

    print("Found email:" + email)
    count+=1
    split_email = email.split('@')

    name = str(split_email[0])
    for letter in name:
        if "." not in name: 
            splitname = ""
        else:
            splitname = name.split('.')



    try:
        print("First name:" + splitname[0])
        print("Last name:" + splitname[1])
        print ("Domain:" + split_email[1])
    except:
        print("First name:" + name[0])
        print("First name:" + name[1:])
        print ("Domain:" + split_email[1])




    print("\n")
print ("Number of emails found: ")    
print (count)
input('Press ENTER key to continue: ')

1 个答案:

答案 0 :(得分:1)

import re

# You can switch this with your file data
example_emails = ['testUwuw@gmail.com', 'FirstLast@email.com', 'FLast@email.com']

for email in example_emails:
  if re.match(r'[\w\.-]+@[\w\.-]+',  email):
    print("Found email:" + email)
    # Split string on char @
    # Example input:
    # testUwu@gmail.com
    # Output:
    # ['testUwu', 'gmail.com']
    split_email = email.split('@')
    # Split string on uppercase letters
    credentials = re.findall('[a-zA-Z][^A-Z]*', split_email[0])
    print("First name:" + credentials[0])
    print("Last name:" + credentials[1])
    print ("Domain:" + split_email[1])
    # Newline for nicer output formatting
    print("\n")

示例输出:

Found email:FirstLast@email.com
First name:First
Last name:Last
Domain:email.com


Found email:FLast@email.com
First name:F
Last name:Last
Domain:email.com

此示例代码仅适用于2种电子邮件格式。

请注意,您可能应该使用一些异常处理,以防其他格式出现错误,示例Test@gmail.com将抛出IndexError异常,因为程序需要2个大写单词。同样,对于包含2个以上大写字母的单词,代码将忽略除第二个大写字母之外的所有字母。

这些是我希望您注意的一些注意事项,如果您肯定只有这两种格式,那应该很好用。