我是tryint来标记文件中的条目。但是由于文件之间的空格数不等,我无法使用line.split("")
选项。我正在从下面的文件中复制几行:
"08-09-2010 21:21:46 00:22:7f:a6:9b:69 -79"
"08-09-2010 21:21:46 04:4f:aa:b4:49:49 -79"
"08-09-2010 21:21:46 04:4f:aa:31:4e:59 tikona 18002090044 -83"
"08-09-2010 21:21:46 00:22:7f:26:9b:69 tikona 18002090044 -74"
"08-09-2010 21:21:46 04:4f:aa:34:0d:c9 tikona 18002090044 -82"
"08-09-2010 21:21:46 04:4f:aa:71:4e:59 -85"
"08-09-2010 21:21:46 04:4f:aa:34:21:89 tikona 18002090044 -75"
"08-09-2010 21:21:46 04:4f:aa:34:49:49 tikona 18002090044 -77"
"08-09-2010 21:21:46 04:4f:aa:74:0d:c9 -85"
"08-09-2010 21:22:47 18 APs were seen
"
我需要访问第一列(datetime
对象)第二列(00:22...
)和最后一列(-79
等)。我可以轻松访问第一列和第二列,但不能访问最后一列。当我执行info=line.spilt("")
时,由于第三列可能或可能没有条目,我无法确定令牌编号。
如何访问第4列?有没有办法可以使用info[i].contains(" -")
?
答案 0 :(得分:7)
列看起来是固定宽度的,在这种情况下,您可以使用字符串切片,然后使用可能的.strip()
来删除尾随空格:
>>> for line in data.split('\n'):
... print (line[1:25].strip(), line[26:45].strip(), line[46:69].strip(), line[70:-1].strip())
...
('08-09-2010 21:21:46', '00:22:7f:a6:9b:69', '', '-79')
('08-09-2010 21:21:46', '04:4f:aa:b4:49:49', '', '-79')
('08-09-2010 21:21:46', '04:4f:aa:31:4e:59', 'tikona 18002090044', '-83')
('08-09-2010 21:21:46', '00:22:7f:26:9b:69', 'tikona 18002090044', '-74')
('08-09-2010 21:21:46', '04:4f:aa:34:0d:c9', 'tikona 18002090044', '-82')
('08-09-2010 21:21:46', '04:4f:aa:71:4e:59', '', '-85')
('08-09-2010 21:21:46', '04:4f:aa:34:21:89', 'tikona 18002090044', '-75')
('08-09-2010 21:21:46', '04:4f:aa:34:49:49', 'tikona 18002090044', '-77')
('08-09-2010 21:21:46', '04:4f:aa:74:0d:c9', '', '-85')
('08-09-2010 21:22:47', '18 APs were seen', '', '')
('', '', '', '')
('', '', '', '')
来自最终输入行"
。
如果列不是固定宽度,那么您仍然可以使用.split()
并使用索引-1
获取 last 列。虽然你应该谨慎使用.split()
,因为当“正确”完成时有点乱。我建议使用双空格作为分隔符来处理18 APs were seen
情况,但请注意,这会更改第二列的索引。
>>> for line in data.split('\n'):
... fields = line.split(' ')
... print (fields[0], fields[3], fields[-1])
...
('"08-09-2010 21:21:46', '00:22:7f:a6:9b:69', ' -79"')
('"08-09-2010 21:21:46', '04:4f:aa:b4:49:49', ' -79"')
('"08-09-2010 21:21:46', '04:4f:aa:31:4e:59', '-83"')
('"08-09-2010 21:21:46', '00:22:7f:26:9b:69', '-74"')
('"08-09-2010 21:21:46', '04:4f:aa:34:0d:c9', '-82"')
('"08-09-2010 21:21:46', '04:4f:aa:71:4e:59', ' -85"')
('"08-09-2010 21:21:46', '04:4f:aa:34:21:89', '-75"')
('"08-09-2010 21:21:46', '04:4f:aa:34:49:49', '-77"')
('"08-09-2010 21:21:46', '04:4f:aa:74:0d:c9', ' -85"')
('"08-09-2010 21:22:47', '18 APs were seen', '18 APs were seen')
('"08-09-2010 21:21:46', '00:22:7f:26:9b:69', '-74"')
Traceback (most recent call last):
File "<input>", line 3, in <module>
IndexError: list index out of range
IndexError
归因于您的上一个输入行。如果这是真正的输入,你应该捕获这个错误。
答案 1 :(得分:1)
您可以使用正则表达式
拆分它#!/usr/bin/env python
import re
mac_data_re = re.compile(
r'^(?P<date>[\d-]+)\s+' +
r'(?P<time>[\d:]+)\s+' +
r'(?P<mac>[\da-f:]+)\s+' +
r'(?P<host>\w+){0,1}\s+' +
r'(?P<host_id>\d+){0,1}\s+'
r'(?P<final_number>-{0,1}\d+)$')
with file('list') as f:
for line in (l.strip() for l in f):
match = mac_data_re.match(line)
if match:
print "date={date}, time={time}, mac={mac}, host={host}, host_id={host_id} final_number={final_number}".format(**match.groupdict())
else:
print "Line not matched: '%s'" % line
这是输出,
aid@bullet:~/tmp$ ./parse_list.py
date=08-09-2010, time=21:21:46, mac=00:22:7f:a6:9b:69, host=None, host_id=None final_number=-79
date=08-09-2010, time=21:21:46, mac=04:4f:aa:b4:49:49, host=None, host_id=None final_number=-79
date=08-09-2010, time=21:21:46, mac=04:4f:aa:31:4e:59, host=tikona, host_id=18002090044 final_number=-83
date=08-09-2010, time=21:21:46, mac=00:22:7f:26:9b:69, host=tikona, host_id=18002090044 final_number=-74
date=08-09-2010, time=21:21:46, mac=04:4f:aa:34:0d:c9, host=tikona, host_id=18002090044 final_number=-82
date=08-09-2010, time=21:21:46, mac=04:4f:aa:71:4e:59, host=None, host_id=None final_number=-85
date=08-09-2010, time=21:21:46, mac=04:4f:aa:34:21:89, host=tikona, host_id=18002090044 final_number=-75
date=08-09-2010, time=21:21:46, mac=04:4f:aa:34:49:49, host=tikona, host_id=18002090044 final_number=-77
date=08-09-2010, time=21:21:46, mac=04:4f:aa:74:0d:c9, host=None, host_id=None final_number=-85
Line not matched: '08-09-2010 21:22:47 18 APs were seen'
答案 2 :(得分:0)
你可以rsplit获取最后一个值,例如“”.rsplit(“”,1)
答案 3 :(得分:0)
您是否可以控制写入该文件的代码?如果是这样,您可以将其更改为使用制表符分隔字段,然后在选项卡上拆分。这将保持一致的场分离。