Python:选择字符串的一部分

时间:2015-10-28 23:25:23

标签: python list

我有一个从文本文件中提取的字符串列表。我需要阅读每一行并“选择”两个特定部分。以下是文本文件(防火墙报告)中的示例行:

2011-04-13 08:52:55 Local4.Info 192.168.1.1 :Apr 13 08:52:55 PDT: %ASA-session-6-302014: Teardown TCP connection 41997800 for Workstations:192.168.2.85/1440 to Servers:192.168.1.6/43032 duration 0:00:00 bytes 2093 TCP FINs

我需要保存“工作站:”之后的IP地址,并知道它们是“工作站IP”,我需要保存服务器IP。

我想最好的技术是创建两个列表,一个用于工作站IP,一个用于服务器IP,并读取每一行并将IP写入各自的列表。

但为了做到这一点,我需要选择它们,我可能会这样做:

workstationIPs = []
serverIPs = []
for line in report:
    workstationIPs.append(line[a:b])
    serverIPs.append(line[c:d])

'a'是工作站IP的开头,'b'是结束('c'和'd'与服务器IP有关)。

然而,并非所有行的长度都相同,因此选择方法不起作用。有没有人对如何从线上提取这两个字符串有任何想法?

PS:这是我的第一个问题,所以请告诉我错误,我可以重新提交。谢谢!)

4 个答案:

答案 0 :(得分:1)

使用正则表达式!

import re
workstationIPs = []
serverIPs = []
for line in report:
    workstationIPs.append(re.search(r'Workstations:((?:\d{1,3}\.){3}\d{1,3})',line).group(1))
    serverIPs.append(r're.search(Servers:((?:\d{1,3}\.){3}\d{1,3})',line).group(1))

示例:

>>> s = '011-04-13 08:52:55 Local4.Info 192.168.1.1 :Apr 13 08:52:55 PDT: %ASA-session-6-302014: Teardown TCP connection 41997800 for **Workstations:192.168.2.85/1440** to **Servers:192.168.1.6/43032** duration 0:00:00 bytes 2093 TCP FINs'
>>> re.search(r'Workstations:((?:\d{1,3}\.){3}\d{1,3})',s).group(1)
'192.168.2.85'

答案 1 :(得分:1)

您可以使用str.partition拆分字符串并获取所需的部分:

workstation_ip = line.partition('Workstations:')[2].partition('/')[0]
server_ip = line.partition('Servers:')[2].partition('/')[0]

为避免重复,请创建一个函数:

def between(line, preceding, following):
    return line.partition(preceding)[2].partition(following)[0]
...
workstation_ip = between(line, 'Workstations:', '/')
server_ip = between(line, 'Servers:', '/')

答案 2 :(得分:0)

这是使用split和list comp的一种方法:

str = "2011-04-13 08:52:55 Local4.Info 192.168.1.1 :Apr 13 08:52:55 PDT: %ASA-session-6-302014: Teardown TCP connection 41997800 for **Workstations:192.168.2.85/1440** to **Servers:192.168.1.6/43032** duration 0:00:00 bytes 2093 TCP FINs"
workstationIPs = [item.split(':')[1].replace("**", "").split("/")[0] for item in str.split(' ') if "**Workstations:" in item]
serverIPs = [item.split(':')[1].replace("**", "").split("/")[0] for item in str.split(' ') if "**Servers:" in item]
print workstationIPs
print serverIPs

或者使用正则表达式和列表comp:

import re
str = "2011-04-13 08:52:55 Local4.Info 192.168.1.1 :Apr 13 08:52:55 PDT: %ASA-session-6-302014: Teardown TCP connection 41997800 for **Workstations:192.168.2.85/1440** to **Servers:192.168.1.6/43032** duration 0:00:00 bytes 2093 TCP FINs"
workstationIPs = [re.findall(r'[0-9]+(?:\.[0-9]+){3}', item)[0] for item in str.split(' ') if "**Workstations:" in item]
serverIPs = [re.findall(r'[0-9]+(?:\.[0-9]+){3}', item)[0] for item in str.split(' ') if "**Servers:" in item]
print workstationIPs
print serverIPs

两者都屈服:

['192.168.2.85']
['192.168.1.6']

答案 3 :(得分:0)

如果空格的数量是一致的,你可以尝试这个,它在空格上分割,删除astrisks,并在第一个冒号之后获取内容

workstationIPs = []
serverIPs = []
for line in report:
    items = line.split()
    workstationIPs.append(items[14].strip('*').split(':')[1])
    serverIPs.append(items[16].strip('*').split(':')[1])