我正在使用python 3.
我正在读取dictreader中的CSV文件,并试图查看哪个国家/地区的发生率最高。
注意我使用的是dictreader,而不是读者。我认为这是必要的,因为我正在使用Counter。
我遇到了麻烦,因为我的CSV文件中的某些行有空字节(特别是在密码字段中),这会导致我的脚本因csv阅读器不喜欢空字节而导致错误。这方面的一个例子是我在下面的评论中的最后一个样本行。我看到有些人用我的代码中的行删除空字节:readerobject(x.replace('\0', '') for x in csvfile)
但是我似乎无法使用它,因为我已经将csvfile读入前一行的readerobject中。
这是我的代码
'''
sample csv lines
Brazil,200.145.23.13,pi,raspberry,failed,None,None,None
Brazil,200.145.23.13,pi,raspberryraspberry993311,failed,None,None,None
China,121.201.83.134,root,123456,succeeded,None,None,None
United Kingdom,185.38.148.238,root,123456,succeeded,None,None,None
Croatia,5.188.10.141,root,admin,succeeded,None,None,None
France,195.154.44.31,squid,123456,failed,None,None,None
France,195.154.44.31,squid,123456,failed,None,None,None
Croatia,5.188.10.141,root,123456,succeeded,None,None,None
Croatia,5.188.10.141,root,admin,succeeded,None,None,None
Croatia,5.188.10.141,root,123456,succeeded,None,None,None
Netherlands,109.236.91.85,root,admin,succeeded,None,None,None
France,51.255.160.205,root,admin,succeeded,None,None,None
United States,207.138.132.44,root,seiko2005,failed,None,None,None
France,212.83.150.189,support," ",failed,None,None,None <-- these are null bytes inside the ""
'''
import codecs
from pprint import pprint
from collections import Counter
import csv
linecount = 0
import time
country_counter = Counter()
print("parsing CSV log file")
with open('C:/Users/Home/Documents/kippo stuff/final lab/kippo/oldkippo4final.csv', newline='') as csvfile:
readerobject = csv.DictReader(csvfile, delimiter=',', fieldnames=['Country', 'IP Address', 'Username', 'Password', 'Status', 'name', 'intention', 'OS'])
readerobject(x.replace('\0', '') for x in csvfile)
for row in readerobject:
print(row, "\n\n")
linecount +=1
country_counter[row['Country']] +=1
print(linecount)
print(country_counter.most_common(3))
print("the total linecount was: ", linecount)