Question

我有一个包含如下输入的文件：

host1 192.168.100.24
user1@abc.com  host2 192.168.100.45 host7 192.168.100.40 host3 192.168.100.34 host4 192.168.100.20
user2@xyz.com  host8 192.168.100.48 host6 192.168.100.43 host10 192.168.100.37 
host5 192.168.100.24 host9 192.168.100.33

预期产出：

no_email: 
      host1 192.168.100.24
      host5 192.168.100.24 
      host9 192.168.100.33
user1@abc.com:  
            host2 192.168.100.45
            host7 192.168.100.40
            host3 192.168.100.34 
            host4 192.168.100.20
user2@xyz.com: 
            host8 192.168.100.48
            host6 192.168.100.43 
            host10 192.168.100.37

代码：

def get_contacts(filename):

emails = []
hostname = []
ip = []
with open(filename,'r') as contacts_file:
    for a_contact in contacts_file:
        match = re.match('^[_a-z0-9-]+(\.[_a-z0-9-]+)*@[a-z0-9-]+(\.[a-z0-9-]+)*(\.[a-z]{2,4})$', a_contact.split()[0])
        if match == None:
           emails.append('no_email')
           hostname.append(a_contact.split()[0])
           ip.append(a_contact.split()[1])
        line_length = a_contact.count(' ')
        elif line_length > 1:
           emails.append(a_contact.split()[0])
           hostname.append(a_contact.split()[1])
           ip.append(a_contact.split()[2])
        else:
           emails.append(a_contact.split()[0])
           hostname.append(a_contact.split()[1])
           ip.append(a_contact.split()[2])
return emails, hostname, ip

我只想返回主机名和IP列表，用于发送到列表返回的指定电子邮件地址。任何人都可以帮助我轻松完成它吗？感谢。

Answer 1

首先安装validate_email模块：

$pip3 install validate_email

然后：

from validate_email import validate_email

result = {}
with open('file.txt') as f:
    for line in f:
        words = line.split()
        if validate_email(words[0]): # If first word of the line is a valid email, lets store data on the result dict using the email as key.
            email = words[0]
            words = words[1:]
        else:
            email = 'no_email'

        hosts_emails = [(words[i], words[i+1]) for i in range(0, len(words) - 1, 2)]
        (result.setdefault(email, [])).append(hosts_emails)

print(result)

<强>输出：

{'no_email': [[('host1', '192.168.100.24')], [('host5', '192.168.100.24'), ('host9', '192.168.100.33')]], 'user1@abc.com': [[('host2', '192.168.100.45'), ('host7', '192.168.100.40'), ('host3', '192.168.100.34'), ('host4', '192.168.100.20')]], 'user2@xyz.com': [[('host8', '192.168.100.48'), ('host6', '192.168.100.43'), ('host10', '192.168.100.37')]]}

Answer 2

希望这会有所帮助。使用字典是明智的，其中密钥可以是no_email或email_id（如果与电子邮件正则表达式匹配）。对于每次迭代，我们首先将to_update变量设置为no-email，并且只有在找到匹配的电子邮件时才更改它。因此，host_and_ip变量被设置为仅使用主机和ips获取每行的部分，即，当检测到匹配的电子邮件时剥离电子邮件地址。如果检测到电子邮件，我们会看到我们的字典dicto中是否已存在相同的电子邮件，如果是，我们只需更新主机和ips，否则我们将正确初始化电子邮件列表（作为新密钥）。

import re
def get_contacts(filename):
    dicto={}
    dicto['no_email']=[]
    with open(filename,'r') as contacts_file:
        for a_contact in contacts_file:
            match = re.match('^[_a-z0-9-]+(\.[_a-z0-9-]+)*@[a-z0-9-]+(\.[a-z0-9-]+)*(\.[a-z]{2,4})$', a_contact.split()[0])
            to_update = 'no_email'      #by default to_update is set to no_email
            if match == None:
                host_and_ip = a_contact.split() #grab all as host and ip
            else:
                curr_email = a_contact.split()[0]
                if curr_email not in dicto.keys():
                    dicto[curr_email]=[]    #initialize for new email
                host_and_ip = a_contact.split()[1:]  #grab leaving one behind i.e. the email
                to_update = curr_email  #to be updated to the email
            for i in range(len(host_and_ip)//2):
                dicto[to_update]+=[[host_and_ip[2*i],host_and_ip[2*i+1]]]
    return dicto

print(get_contacts('test.txt'))

该函数将提供如下字典：

{'no_email': [['host1', '192.168.100.24'], ['host4', '192.168.100.20'], ['host5', '192.168.100.24'], ['host9', '192.168.100.33']], 'user1@abc.com': [['host2', '192.168.100.45'], ['host7', '192.168.100.40'], ['host3', '192.168.100.34']], 'user2@xyz.com': [['host8', '192.168.100.48'], ['host6', '192.168.100.43'], ['host10', '192.168.100.37']]}

您可以轻松访问特定电子邮件ID的主机和IP列表，如下所示：

get_contacts('test.txt')['user1@abc.com']将返回主机和ips列表。

Answer 3

我使用第三方库more_itertools来帮助实现grouper itertools配方。这可以通过pip install more_itertools安装。

import more_itertools as mit


dd = ct.defaultdict(list)
with open(filename, "r") as f:
    for line in f.readlines():
        parts = line.split()
        if "@" not in parts[0]:
            dd["no email"].extend(list(mit.grouper(2, parts)))
        else:
            name = parts[0]
            dd[name].extend(list(mit.grouper(2, parts[1:])))

dd

输出

defaultdict(list,
            {'no email': [
              ('host1', '192.168.100.24'),
              ('host5', '192.168.100.24'),
              ('host9', '192.168.100.33')],
             'user1@abc.com': [
              ('host2', '192.168.100.45'),
              ('host7', '192.168.100.40'),
              ('host3', '192.168.100.34'),
              ('host4', '192.168.100.20')],
             'user2@xyz.com': [
              ('host8', '192.168.100.48'),
              ('host6', '192.168.100.43'),
              ('host10', '192.168.100.37')]})

grouper配方帮助重新组合（主机，IP），每行后面都用空格分隔。

您可以选择不安装more_itertools来实施此配方。

来自itertools recipes（在Python 3中）：

from itertools import zip_longest 


def grouper(iterable, n, fillvalue=None):
    "Collect data into fixed-length chunks or blocks"
    # grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx"
    args = [iter(iterable)] * n
    return zip_longest(*args, fillvalue=fillvalue)

Answer 4

一种方法是拆分每一行，并确定第一个条目中是否有Request.GetOwinContext().Request.Headers.Remove("X-XSRF-TOKEN");个字符。然后使用切片来提取剩余的条目：

这将显示：

def get_contacts(filename):
    no_email = []
    users = []

    with open(filename) as f_contacts:
        for row in f_contacts:
            entries = row.split()

            if '@' in entries[0]:
                pairs = [entries[i:i+2] for i in range(1, len(entries), 2)]
                users.append([entries[0], pairs])
            else:
                for i in range(0, len(entries), 2):
                    no_email.append(entries[i:i+2])

    return no_email, users

no_email, users = get_contacts('contacts.txt')            

print "no_email:"
for host, ip in no_email:
    print "    {} {}".format(host, ip)

for user_entry in users:    
    print "{}:".format(user_entry[0])
    for host, ip in user_entry[1]:
        print "    {} {}".format(host, ip)

no_email: host1 192.168.100.24 host5 192.168.100.24 host9 192.168.100.33 user1@abc.com: host2 192.168.100.45 host7 192.168.100.40 host3 192.168.100.34 host4 192.168.100.20 user2@xyz.com: host8 192.168.100.48 host6 192.168.100.43 host10 192.168.100.37以users

的形式存储条目

如果您的文件对同一个用户有多行，则需要使用["username", [["host1", "ip1"], ["host2, "ip2"]]]在同一位置存储同一用户的所有条目。

如何将变量长度值附加到Python中的列表中

4 个答案: