我有一个net-flow数据的日志文件,我试图按ip地址和时间戳排序并添加字节。因此,它需要按字节数的降序列出相同的IP地址。
文件的输出显示为:
Min Source IP Bytes
./ R2snd / 2014/02/02/02 / 25.flows:100.000.000.000 | 101.101.101.101 | 0 | 4 | 3 | 2 | 96 | 1391336665 | 1391336668 | 3361 | 445 | 2 | 6 | 0 | 0 | 0 | 0 | 0
出于某种原因,我只能让它显示分钟,但我需要格式化整个时间和日期。分钟是我上面输入的最后一个/数字。然后我需要它来获取文件中的每个IP地址并通过ip对它们进行排序,因此重复ips会一起出现,并为每个ip添加发送的字节数。我试着在下面用字典做这个,但我似乎无法让它工作。然后我需要按字节降序对字典进行排序,因为对于每个ip条目,它需要添加字节,因此每个ip的顶部条目将是该ip发送的总字节数。
import operator
with open('/home/username/Documents/log') as f:
for line in f:
#save the data into an array
firstsplitforminute = line.split('/')
secondsplitforminute = firstsplitforminute[6].split('.')
firstsplitforsourceip = line.split('|')
secondsplitforsourceip = firstsplitforsourceip[0].split(':')
minute = secondsplitforminute[0]
sourceip = secondsplitforsourceip[1]
bytes = line.split('|')[6]
protocol = line.split('|')[12]
if protocol == '6':
entries = {'IP':sourceip, 'BYTES':bytes, 'MIN':minute}
sum(item['BYTES'] for item in entries)
def sortbykey():
sortedbykeydict = sorted(entries.items(), key = lambda t: t[1])
print sortedbykeydict
sortbykey()
else:
pass
但是当我运行此代码时出现以下错误:
File "/home/grant/.eclipse/org.eclipse.platform_3.8_155965261/plugins/org.python.pydev_3.4.1.201403181715/pysrc/pydevd.py", line 1844, in <module>
debugger.run(setup['file'], None, None)
File "/home/grant/.eclipse/org.eclipse.platform_3.8_155965261/plugins/org.python.pydev_3.4.1.201403181715/pysrc/pydevd.py", line 1372, in run
pydev_imports.execfile(file, globals, locals) # execute the script
File "/home/grant/workspace/Learning/LogfileExtractor.py", line 16, in <module>
sum(item['BYTES'] for item in entries)
File "/home/grant/workspace/Learning/LogfileExtractor.py", line 16, in <genexpr>
sum(item['BYTES'] for item in entries)
TypeError: string indices must be integers, not str
答案 0 :(得分:0)
尝试解析:'BYTES':int(bytes)
(据我了解你的代码应该有用)
答案 1 :(得分:0)
@BartoszKP是正确的。 Python循环遍历entries
,这不会产生字符串:
entries = {'IP':sourceip, 'BYTES':bytes, 'MIN':minute}
sum(item['BYTES'] for item in entries)
相反,你应该“逐条”字典:
sum(v for k,v in entries.items())
这意味着在第一次迭代期间'IP'
存储在k
中,sourceip
存储在v
;第二个,'BYTES'
存储在k
中,bytes
存储在v
;等等...