Python用于对文件中的uniq名称进行排序和计数

时间:2017-11-28 08:12:54

标签: python python-3.x counting

我试图在Linux <div class="form-group" autocomplete="new-password"> @Html.LabelFor(m => m.Password, new { @class = "col-md-5 control-label" }) <div class="col-md-5"> @Html.PasswordFor(m => m.Password, new { @Title = Password, @class = "form-control", @autocomplete = "new-password" }) </div> </div> <div class="form-group" autocomplete="new-password"> @Html.LabelFor(m => m.ConfirmPassword, new { @class = "col-md-5 control-label" }) <div class="col-md-5"> @Html.PasswordFor(m => m.ConfirmPassword, new { @class = "form-control", @autocomplete = "new-password" }) </div> </div> <div class="form-group"> <div class="col-md-offset-2 col-md-5"> <input type="submit" class="btn btn-default" value="Confirmer" /> </div> </div> 中读取一个具有特殊字符串模式的行,我在下面给出了这一点。从这个行模式我查看用户的电子邮件地址,例如/var/log/messages并使用rajeshm@noi-rajeshm.fox.com方法将其分成两部分作为列表索引,然后进一步将第一个拆分为一个列表,以便于获取最后一个索引值,即用户ID,并且工作正常。

说我能够获取用户列表和总计数,但我需要计算每个用户的出现次数并打印str.partition(),因此键和值。

  

11月28日09:00:08 foxopt210 rshd [6157]:pam_rhosts(rsh:auth):允许   访问rajeshm@noi-rajeshm.fox.com作为rajeshm

user_name: Count

目前的代码如下:

#!/usr/bin/python3
f= open("/var/log/messages")
count = 0
for line in f:
  if "allowed access"  in line:
    count+=1
    user_id = line.partition('@')[0]
    user_id = user_id.split()[-1]
    print(user_id)
f.close()
print("--------------------")
print("Total Count :" ,count)

在谷歌搜索时,我想到了为此使用字典 目的和它按预期工作:

bash-4.1$ ./log.py | tail
navit
akaul
akaul
pankaja
vishalm
vishalm
rajeshm
rajeshm
--------------------
Total Count : 790

我的输出符合要求:

#!/usr/bin/python3
from collections import Counter
f= open("/var/log/messages")
count = 0
dictionary = {}
for line in f:
  if "allowed access"  in line:
    user_id = line.partition('@')[0]
    user_count = user_id.split()[-1]
    if user_count in dictionary:
        dictionary[user_count] += 1
    else:
       dictionary[user_count] = 1
for user_count, occurences in dictionary.items():
    print(user_count, ':', occurences)

我只是想看看是否有更好的方法来进行这项练习。

2 个答案:

答案 0 :(得分:4)

在计算内容时,使用collections.Counter() class会更容易。我将这些行解析为生成器:

def users_accessed(fileobj):
    for line in fileobj:
        if 'allowed access' in line:
            yield line.partition('@')[0].rsplit(None, 1)[-1]

并将其传递给Counter()对象:

from collections import Counter

with open("/var/log/messages") as f:
    access_counts = Counter(users_accessed(f))

for userid, count in access_counts.most_common():
    print(userid, count, sep=':')

这使用Counter.most_common() method来提供排序输出(最常见的是最少)。

答案 1 :(得分:1)

您可以尝试使用正则表达式,并且可以执行此操作:

import re
pattern=r'(?<=as\s)\w.+'
occurrence={}

with open("/var/log/messages") as f:
    for line in f:
        search=re.search(pattern,line).group()

        if  search not in occurrence:
            occurrence[search]=1
        else:
            occurrence[search]=occurrence.get(search)+1

print(occurrence)
  

只是为了好玩的一线逻辑:

import re
pattern=r'(?<=as\s)\w.+'
new={}
[new.__setitem__(re.search(pattern, line).group(), 1) if re.search(pattern, line).group() not in new  else new.__setitem__(re.search(pattern, line).group(), new.get(re.search(pattern, line).group()) + 1) for line in open('legend.txt','r')]

print(new)