如何在Python中汇总字典并按键值排序

时间:2014-05-20 23:54:18

标签: python loops

我有一个net-flow数据的日志文件,我试图按ip地址和时间戳排序并添加字节。因此,它需要按字节数的降序列出相同的IP地址。

文件的输出显示为:

                 Min       Source IP                     Bytes

./ R2snd / 2014/02/02/02 / 25.flows:100.000.000.000 | 101.101.101.101 | 0 | 4 | 3 | 2 | 96 | 1391336665 | 1391336668 | 3361 | 445 | 2 | 6 | 0 | 0 | 0 | 0 | 0

出于某种原因,我只能让它显示分钟,但我需要格式化整个时间和日期。分钟是我上面输入的最后一个/数字。然后我需要它来获取文件中的每个IP地址并通过ip对它们进行排序,因此重复ips会一起出现,并为每个ip添加发送的字节数。我试着在下面用字典做这个,但我似乎无法让它工作。然后我需要按字节降序对字典进行排序,因为对于每个ip条目,它需要添加字节,因此每个ip的顶部条目将是该ip发送的总字节数。

import operator
with open('/home/username/Documents/log') as f:
    for line in f:
        #save the data into an array
        firstsplitforminute = line.split('/')
        secondsplitforminute = firstsplitforminute[6].split('.')
        firstsplitforsourceip = line.split('|')
        secondsplitforsourceip = firstsplitforsourceip[0].split(':')
        minute = secondsplitforminute[0]
        sourceip = secondsplitforsourceip[1]
        bytes = line.split('|')[6]
        protocol = line.split('|')[12]

        if protocol == '6':
            entries = {'IP':sourceip, 'BYTES':bytes, 'MIN':minute}
            sum(item['BYTES'] for item in entries)
            def sortbykey():
                sortedbykeydict = sorted(entries.items(), key = lambda t: t[1])
                print sortedbykeydict
             sortbykey() 
        else:
            pass

但是当我运行此代码时出现以下错误:

File "/home/grant/.eclipse/org.eclipse.platform_3.8_155965261/plugins/org.python.pydev_3.4.1.201403181715/pysrc/pydevd.py", line 1844, in <module>
    debugger.run(setup['file'], None, None)
  File "/home/grant/.eclipse/org.eclipse.platform_3.8_155965261/plugins/org.python.pydev_3.4.1.201403181715/pysrc/pydevd.py", line 1372, in run
    pydev_imports.execfile(file, globals, locals)  # execute the script
  File "/home/grant/workspace/Learning/LogfileExtractor.py", line 16, in <module>
    sum(item['BYTES'] for item in entries)
  File "/home/grant/workspace/Learning/LogfileExtractor.py", line 16, in <genexpr>
    sum(item['BYTES'] for item in entries)
TypeError: string indices must be integers, not str

2 个答案:

答案 0 :(得分:0)

尝试解析:'BYTES':int(bytes)

(据我了解你的代码应该有用)

答案 1 :(得分:0)

@BartoszKP是正确的。 Python循环遍历entries,这不会产生字符串:

entries = {'IP':sourceip, 'BYTES':bytes, 'MIN':minute}
sum(item['BYTES'] for item in entries)

相反,你应该“逐条”字典:

sum(v for k,v in entries.items())

这意味着在第一次迭代期间'IP'存储在k中,sourceip存储在v;第二个,'BYTES'存储在k中,bytes存储在v;等等...