在特定分隔符后提取IP

时间:2014-02-19 20:49:56

标签: python python-2.7

我正在尝试仅从文件中提取IP,以数字方式组织它们并将结果放在另一个文件中。

数据如下所示:

The Spammer (and all his/her info): 
Username: user 
User ID Number: 0 
User Registration IP Address: 77.123.134.132 
User IP Address for Selected Post: 177.43.168.35 
User Email: email@address.com

这是我的代码,它没有正确排序IP(即它在77.123.134.132之前列出了177.43.168.35):

import re

spammers = open('spammers.txt', "r")
ips = []
for text in spammers.readlines():
    text = text.rstrip()
    print text
    regex = re.findall(r'(?:[\d]{1,3})\.(?:[\d]{1,3})\.(?:[\d]{1,3})\.(?:[\d]{1,3})$',text)
    if regex is not None and regex not in ips:
        ips.append(regex)

for ip in ips:
    OrganizedIPs = open("Organized IPs.txt", "a")
    addy = "".join(ip)
    if addy is not '':
        print "IP: %s" % (addy)
        OrganizedIPs.write(addy)
        OrganizedIPs.write("\n")
        spammers.close()
        OrganizedIPs.close()

organize = open("Organized IPs.txt", "r")
ips = organize.readlines();
ips = list(set(ips))
print ips
for i in range(len(ips)):
    ips[i] = ips[i].replace('\n', '')
print ips
ips.sort()
finish = open('organized IPs.txt', 'w')
finish.write('\n'.join(ips))
finish.close()
clean = open('spammers.txt', 'w')
clean.close()

我曾尝试使用this IP sorter code,但它需要一个字符串,因为正则表达式会返回一个列表。

3 个答案:

答案 0 :(得分:3)

或者这个(保存字符串格式化费用):

def ipsort (ip):
    return tuple (int (t) for t in ip.split ('.') )

ips = ['1.2.3.4', '100.2.3.4', '62.1.2.3', '62.1.22.4']
print (sorted (ips, key = ipsort) )

答案 1 :(得分:0)

试试这个:

sorted_ips = sorted(ips, key=lambda x: '.'.join(["{:>03}".format(octet) for octet in x.split(".")])

答案 2 :(得分:0)

import re

LOG    = "spammers.txt"
IPV4   = re.compile(r"(?:[\d]{1,3})\.(?:[\d]{1,3})\.(?:[\d]{1,3})\.(?:[\d]{1,3})")
RESULT = "organized_ips.txt"

def get_ips(fname):
    with open(fname) as inf:
        return IPV4.findall(inf.read())

def numeric_ip(ip):
    return [int(i) for i in ip.split(".")]

def write_to(fname, iterable, fmt):
    with open(fname, "w") as outf:
        for i in iterable:
            outf.write(fmt.format(i))

def main():
    ips = get_ips(LOG)
    ips = list(set(ips))      # uniquify
    ips.sort(key=numeric_ip)
    write_to(RESULT, ips, "IP: {}\n")

if __name__=="__main__":
    main()