Question

我有一个电子邮件地址列表，其中一些来自相关域，而其他则来自垃圾邮件/无关电子邮件域。我想同时捕获这两个，但要在单独的列表中。我知道相关邮件来自何处（总是相同的域-@gmail.com，但垃圾邮件来自不同的邮件，都需要捕获它们）。

    # Extract all email ids from a JSON file
    import re
    import json

     with open("test.json", 'r') as fp:
         json_decode = json.loads(fp.read())

         line = str(json_decode)

         match = re.findall(r'[\w\.-]+@[\w.-]+', line)
         l = len(match)
         print(match)

         for i in match:
             domain = match.split('@')[i]


        OUTPUT: match = ['image001.png@01D36CD8.2A2219D0', 'arealjcl@countable.us', 'taylor.l.ingram@gmail.com']

前两个是垃圾邮件，第三个是合法电子邮件，因此它们必须位于不同的列表中。我要在@处进行拆分以确定域还是排除所有非@gmail.com的内容并转储到另一个列表中。

Answer 1

我建议您使用endswith()函数。这是使用方法：

legit = []
spam = []

# We iterate through the list of matches
for email in match:

    # This checks if the email ends with @gmail.com.
    # If it returns True, that means it is a good email.
    # But, if it returns False, then it means that the email
    # is spam.
    email_status = email.endswith("@gmail.com")


    if email_status == False:
        spam.append(email)

    else:
        legit.append(email)

编辑：更改了代码，以便其正确回答您的问题

Answer 2

在'@'上拆分电子邮件地址时，将获得两个项目列表：

In [3]: 'image001.png@01D36CD8.2A2219D0'.split('@')
Out[3]: ['image001.png', '01D36CD8.2A2219D0']

如果要检查 domain 索引结果的第二项：

In [4]: q = 'image001.png@01D36CD8.2A2219D0'.split('@')

In [5]: q[1]
Out[5]: '01D36CD8.2A2219D0'

所以您的for循环更像是：

In [9]: for thing in match:
   ...:     domain = thing.split('@')[1]
   ...:     print(domain)
   ...:     
01D36CD8.2A2219D0
countable.us
gmail.com

Answer 3

您可以按定义的相关域将它们分为两个列表

 # extract all email ids from a json file
 import re
 import json

 relevant_domains = ['@gmail.com'] # you can add more

 with open("test.json", 'r') as fp:
     json_decode = json.loads(fp.read())

     line = str(json_decode)

     match = re.findall(r'[\w\.-]+@[\w.-]+', line)
     l = len(match)
     print(match)

     relevant_emails = []
     spam_emails = []

     for email in match:
         domain = email.split('@')[1]

         if domain in relevant_domains:
             relevant_emails.append(email)
         else:
             spam_emails.append(email)

如何从特定域中排除电子邮件地址并以Python方式提取其他地址

3 个答案: