我有一个包含IP地址列表的.txt文件:
111.67.74.234:8080
111.67.75.89:8080
12.155.183.18:3128
128.208.04.198:2124
142.169.1.233:80
除此之外还有很多:)
无论如何,使用Python将其导入到列表中,我试图让它对它们进行排序,但我遇到了麻烦。有人有什么想法吗?
编辑: 好吧,因为那是模糊的,这是我公平的。
f = open("/Users/jch5324/Python/Proxy/resources/data/list-proxy.txt", 'r+')
lines = [x.split() for x in f]
new_file = (sorted(lines, key=lambda x:x[:18]))
答案 0 :(得分:5)
你可能正在通过ascii string-comparison('。'<'5'等)对它们进行排序,而你更倾向于用数字排序。尝试将它们转换为整数元组,然后排序:
def ipPortToTuple(string):
"""
'12.34.5.678:910' -> (12,34,5,678,910)
"""
ip,port = string.strip().split(':')
return tuple(int(i) for i in ip.split('.')) + (port,)
with open('myfile.txt') as f:
nonemptyLines = (line for line in f if line.strip()!='')
sorted(nonemptyLines, key=ipPortToTuple)
编辑:你得到的ValueError是因为你的文本文件并不完全是#。#。#。#:#format。 (可能有注释或空行,但在这种情况下,错误会暗示有一行有多个':'。)您可以使用调试技术来解决您的问题,通过捕获异常并发出有用的调试数据:
def tryParseLines(lines):
for line in lines:
try:
yield ipPortToTuple(line.strip())
except Exception:
if __debug__:
print('line {} did not match #.#.#.#:# format'.format(repr(line)))
with open('myfile.txt') as f:
sorted(tryParseLines(f))
我在上面有点草率,因为它仍然允许一些无效的IP地址通过(例如#。#。#。#。#,或257.-1。#。#)。下面是一个更全面的解决方案,它允许您执行比较IP地址与<
运算符之类的操作,同时使排序工作自然:
#!/usr/bin/python3
import functools
import re
@functools.total_ordering
class Ipv4Port(object):
regex = re.compile(r'(\d{1,3})\.(\d{1,3})\.(\d{1,3})\.(\d{1,3}):(\d{1,5})')
def __init__(self, ipv4:(int,int,int,int), port:int):
try:
assert type(ipv4)==tuple and len(ipv4)==4, 'ipv4 not 4-length tuple'
assert all(0<=x<256 for x in ipv4), 'ipv4 numbers not in valid range (0<=n<256)'
assert type(port)==int, 'port must be integer'
except AssertionError as ex:
print('Invalid IPv4 input: ipv4={}, port={}'.format(repr(ipv4),repr(port)))
raise ex
self.ipv4 = ipv4
self.port = port
self._tuple = ipv4+(port,)
@classmethod
def fromString(cls, string:'12.34.5.678:910'):
try:
a,b,c,d,port = cls.regex.match(string.strip()).groups()
ip = tuple(int(x) for x in (a,b,c,d))
return cls(ip, int(port))
except Exception as ex:
args = list(ex.args) if ex.args else ['']
args[0] += "\n...indicating ipv4 string {} doesn't match #.#.#.#:# format\n\n".format(repr(string))
ex.args = tuple(args)
raise ex
def __lt__(self, other):
return self._tuple < other._tuple
def __eq__(self, other):
return self._tuple == other._tuple
def __repr__(self):
#return 'Ipv4Port(ipv4={ipv4}, port={port})'.format(**self.__dict__)
return "Ipv4Port.fromString('{}.{}.{}.{}:{}')".format(*self._tuple)
然后:
def tryParseLines(lines):
for line in lines:
line = line.strip()
if line != '':
try:
yield Ipv4Port.fromString(line)
except AssertionError as ex:
raise ex
except Exception as ex:
if __debug__:
print(ex)
raise ex
演示:
>>> lines = '222.111.22.44:214 \n222.1.1.1:234\n 23.1.35.6:199'.splitlines()
>>> sorted(tryParseLines(lines))
[Ipv4Port.fromString('23.1.35.6:199'), Ipv4Port.fromString('222.1.1.1:234'), Ipv4Port.fromString('222.111.22.44:214')]
将值更改为例如264...
或...-35...
将导致相应的错误。
答案 1 :(得分:0)
@Ninjagecko的解决方案是最好的,但这是使用re的另一种方式:
>>> import re
>>> with open('ips.txt') as f:
print sorted(f, key=lambda line: map(int, re.split(r'\.|:', line.strip())))
['12.155.183.18:3128\n', '111.67.74.234:8080\n', '111.67.75.89:8080\n',
'128.208.04.198:2124\n', '142.169.1.233:80 \n']
答案 2 :(得分:-1)
您可以预处理列表,以便使用内置比较功能对其进行排序。然后将其处理回更正常的格式。
字符串长度相同,可以排序。之后,我们只删除所有空格。
你可以谷歌并找到其他的例子。
for i in range(len(address)):
address[i] = "%3s.%3s.%3s.%3s" % tuple(ips[i].split("."))
address.sort()
for i in range(len(address)):
address[i] = address[i].replace(" ", "")
如果您有大量的IP地址,那么如果使用c ++,您将获得更好的处理时间。它将提前完成更多工作,但您将获得更好的处理时间。