我有一个从文本文件中提取的字符串列表。我需要阅读每一行并“选择”两个特定部分。以下是文本文件(防火墙报告)中的示例行:
2011-04-13 08:52:55 Local4.Info 192.168.1.1 :Apr 13 08:52:55 PDT: %ASA-session-6-302014: Teardown TCP connection 41997800 for Workstations:192.168.2.85/1440 to Servers:192.168.1.6/43032 duration 0:00:00 bytes 2093 TCP FINs
我需要保存“工作站:”之后的IP地址,并知道它们是“工作站IP”,我需要保存服务器IP。
我想最好的技术是创建两个列表,一个用于工作站IP,一个用于服务器IP,并读取每一行并将IP写入各自的列表。
但为了做到这一点,我需要选择它们,我可能会这样做:
workstationIPs = []
serverIPs = []
for line in report:
workstationIPs.append(line[a:b])
serverIPs.append(line[c:d])
'a'是工作站IP的开头,'b'是结束('c'和'd'与服务器IP有关)。
然而,并非所有行的长度都相同,因此选择方法不起作用。有没有人对如何从线上提取这两个字符串有任何想法?
PS:这是我的第一个问题,所以请告诉我错误,我可以重新提交。谢谢!)
答案 0 :(得分:1)
使用正则表达式!
import re
workstationIPs = []
serverIPs = []
for line in report:
workstationIPs.append(re.search(r'Workstations:((?:\d{1,3}\.){3}\d{1,3})',line).group(1))
serverIPs.append(r're.search(Servers:((?:\d{1,3}\.){3}\d{1,3})',line).group(1))
示例:
>>> s = '011-04-13 08:52:55 Local4.Info 192.168.1.1 :Apr 13 08:52:55 PDT: %ASA-session-6-302014: Teardown TCP connection 41997800 for **Workstations:192.168.2.85/1440** to **Servers:192.168.1.6/43032** duration 0:00:00 bytes 2093 TCP FINs'
>>> re.search(r'Workstations:((?:\d{1,3}\.){3}\d{1,3})',s).group(1)
'192.168.2.85'
答案 1 :(得分:1)
您可以使用str.partition拆分字符串并获取所需的部分:
workstation_ip = line.partition('Workstations:')[2].partition('/')[0]
server_ip = line.partition('Servers:')[2].partition('/')[0]
为避免重复,请创建一个函数:
def between(line, preceding, following):
return line.partition(preceding)[2].partition(following)[0]
...
workstation_ip = between(line, 'Workstations:', '/')
server_ip = between(line, 'Servers:', '/')
答案 2 :(得分:0)
这是使用split和list comp的一种方法:
str = "2011-04-13 08:52:55 Local4.Info 192.168.1.1 :Apr 13 08:52:55 PDT: %ASA-session-6-302014: Teardown TCP connection 41997800 for **Workstations:192.168.2.85/1440** to **Servers:192.168.1.6/43032** duration 0:00:00 bytes 2093 TCP FINs"
workstationIPs = [item.split(':')[1].replace("**", "").split("/")[0] for item in str.split(' ') if "**Workstations:" in item]
serverIPs = [item.split(':')[1].replace("**", "").split("/")[0] for item in str.split(' ') if "**Servers:" in item]
print workstationIPs
print serverIPs
或者使用正则表达式和列表comp:
import re
str = "2011-04-13 08:52:55 Local4.Info 192.168.1.1 :Apr 13 08:52:55 PDT: %ASA-session-6-302014: Teardown TCP connection 41997800 for **Workstations:192.168.2.85/1440** to **Servers:192.168.1.6/43032** duration 0:00:00 bytes 2093 TCP FINs"
workstationIPs = [re.findall(r'[0-9]+(?:\.[0-9]+){3}', item)[0] for item in str.split(' ') if "**Workstations:" in item]
serverIPs = [re.findall(r'[0-9]+(?:\.[0-9]+){3}', item)[0] for item in str.split(' ') if "**Servers:" in item]
print workstationIPs
print serverIPs
两者都屈服:
['192.168.2.85']
['192.168.1.6']
答案 3 :(得分:0)
如果空格的数量是一致的,你可以尝试这个,它在空格上分割,删除astrisks,并在第一个冒号之后获取内容
workstationIPs = []
serverIPs = []
for line in report:
items = line.split()
workstationIPs.append(items[14].strip('*').split(':')[1])
serverIPs.append(items[16].strip('*').split(':')[1])