我构建了一些用于浏览文本文件(webserver日志文件)的代码。我的代码工作正常,但我有两个问题。代码工作正常,只有日志文件中可见的第一个用户名不打印而且不计算在内。有谁知道为什么?
我的第二个问题是关于我的count_unique。如何只计算唯一的用户名,我需要做什么?
我的代码:
count_tot = 0
count_unique = 0
file = File.new("text.txt", "r")
line = file.gets
while (line = file.gets)
substrings = line.split("&")
substrings.each do |sub|
if sub.include? 'username'
puts sub
count_tot += 1
else
end
end
end
file.close
puts ""
puts "Total found input values:"
puts count_tot
puts count_unique
示例输入(2行)
[11 / Mar / 2014:00:15:02 +0100]“GET / web / show / id = 568296 HTTP / 1.1”200 8499“https://www.site.com/csc/default.aspx?sid=ertett4353452445.orker2&username=username1×tamp=20140311001443&hashkey=847823786547385243678&”“Mozilla / 5.0(Macintosh; Intel Mac) OS X 10_9_2)AppleWebKit / 537.74.9(KHTML,与Gecko一样)Version / 7.0.2 Safari / 537.74.9“52345 1FD323C0D681D2F10AE789F8A6C0900D.wm9worker5 [11 / Mar / 2014:00:35:50 +0100]“GET / web / show / id = 568296 HTTP / 1.1”200 8499“https://www.site.com/csc/default.aspx?sid=gfdgdfdgfgdfdfg._worker1&username=username2×tamp=20140311003517&hashkey=fdsfsdffsffds&”“Mozilla / 5.0(iPad; CPU OS 7_0_6和Mac一样) OS X)AppleWebKit / 537.51.1(KHTML,如Gecko)CriOS / 33.0.1750.14 Mobile / 11B651 Safari / 9537.53“62415 5852920B165D2E39559241BA8B5FB36A.wm9worker6
答案 0 :(得分:1)
只打印日志文件中可见的第一个用户名并且不计算。有谁知道为什么?
为此你需要做
line = file.gets # remove this.
while (line = file.gets) # keep only this.
line = file.gets
(在while
循环之前)未被处理。在进入while循环之前,行数据丢失了。
<强>更新强>
string = <<_
[11/Mar/2014:00:15:02 +0100] "GET /web/show/id=568296 HTTP/1.1" 200 8499 "https://www.site.com/csc/default.aspx?sid=ertett4353452445.orker2&username=username1×tamp=20140311001443&hashkey=847823786547385243678&" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_2) AppleWebKit/537.74.9 (KHTML, like Gecko) Version/7.0.2 Safari/537.74.9" 52345 1FD323C0D681D2F10AE789F8A6C0900D.wm9worker5
[11/Mar/2014:00:35:50 +0100] "GET /web/show/id=568296 HTTP/1.1" 200 8499 "https://www.site.com/csc/default.aspx?sid=gfdgdfdgfgdfdfg._worker1&username=username2×tamp=20140311003517&hashkey=fdsfsdffsffds&" "Mozilla/5.0 (iPad; CPU OS 7_0_6 like Mac OS X) AppleWebKit/537.51.1 (KHTML, like Gecko) CriOS/33.0.1750.14 Mobile/11B651 Safari/9537.53" 62415 5852920B165D2E39559241BA8B5FB36A.wm9worker6
[11/Mar/2014:00:35:50 +0100] "GET /web/show/id=568296 HTTP/1.1" 200 8499 "https://www.site.com/csc/default.aspx?sid=gfdgdfdgfgdfdfg._worker1&username=username2×tamp=20140311003517&hashkey=fdsfsdffsffds&" "Mozilla/5.0 (iPad; CPU OS 7_0_6 like Mac OS X) AppleWebKit/537.51.1 (KHTML, like Gecko) CriOS/33.0.1750.14 Mobile/11B651 Safari/9537.53" 62415 5852920B165D2E39559241BA8B5FB36A.wm9worker6
_
File.write('f1',string)
@usernames = []
File.foreach('f1') do |line|
#collect all the usernames
@usernames << line[/username=(\w+)/,1]
# do other tasks with *line*
end
@usernames # => ["username1", "username2", "username2"]
# to get the uniq usernames
@usernames.uniq # => ["username1", "username2"]
# if you want to see, which username present how many times, think something
# like below
Hash[@usernames.group_by { |s| s }.map { |k,v| [k,v.size]}]
# => {"username1"=>1, "username2"=>2}
查看方法IO::foreach
以了解我使用它的原因。还可以查看Array#uniq
和group_by
方法。这些文件很清楚。
答案 1 :(得分:1)
首先,IO
类和扩展名File
有一个each
方法,可以生成到块的行。还有一个foreach
类方法,使其更简洁。
File.foreach 'text.txt' do |line|
# Count stuff ...
end
关于你的第一个问题,这是因为你将第一行读入一个变量,然后在while循环子句之后立即覆盖所述变量。这有效地跳过了第一行。上面的例子摆脱了这个问题。
如果不查看我们正在处理的输入,就很难回答第二个问题。
基于String#scan
的简单解决方案可能就足够了:
line.scan /[?&]username=([^&]*)/ do |user_name|
puts user_name
end
因此可以将所有内容简化为:
user_names = File.foreach('text.txt').map do |line|
line.scan /[?&]username=([^&]*)/
end.flatten
user_name_counts = user_names.uniq.inject Hash.new do |hash, user_name|
hash.tap do |hash|
hash[user_name] = user_names.count user_name
end
end
p user_name_counts
# => {"username1"=>1, "username2"=>2}