Ruby检查唯一性

时间:2014-03-11 21:38:23

标签: ruby

我构建了一些用于浏览文本文件(webserver日志文件)的代码。我的代码工作正常,但我有两个问题。代码工作正常,只有日志文件中可见的第一个用户名不打印而且不计算在内。有谁知道为什么?

我的第二个问题是关于我的count_unique。如何只计算唯一的用户名,我需要做什么?

我的代码:

count_tot = 0 
count_unique = 0


file = File.new("text.txt", "r")
line = file.gets


while (line = file.gets)


substrings = line.split("&")

substrings.each do |sub|
  if sub.include? 'username'
    puts sub
    count_tot += 1 
  else
  end
end
end

file.close

puts ""
puts "Total found input values:"
puts count_tot
puts count_unique

示例输入(2行)

[11 / Mar / 2014:00:15:02 +0100]“GET / web / show / id = 568296 HTTP / 1.1”200 8499“https://www.site.com/csc/default.aspx?sid=ertett4353452445.orker2&username=username1&timestamp=20140311001443&hashkey=847823786547385243678&”“Mozilla / 5.0(Macintosh; Intel Mac) OS X 10_9_2)AppleWebKit / 537.74.9(KHTML,与Gecko一样)Version / 7.0.2 Safari / 537.74.9“52345 1FD323C0D681D2F10AE789F8A6C0900D.wm9worker5 [11 / Mar / 2014:00:35:50 +0100]“GET / web / show / id = 568296 HTTP / 1.1”200 8499“https://www.site.com/csc/default.aspx?sid=gfdgdfdgfgdfdfg._worker1&username=username2&timestamp=20140311003517&hashkey=fdsfsdffsffds&”“Mozilla / 5.0(iPad; CPU OS 7_0_6和Mac一样) OS X)AppleWebKit / 537.51.1(KHTML,如Gecko)CriOS / 33.0.1750.14 Mobile / 11B651 Safari / 9537.53“62415 5852920B165D2E39559241BA8B5FB36A.wm9worker6

2 个答案:

答案 0 :(得分:1)

  

只打印日志文件中可见的第一个用户名并且不计算。有谁知道为什么?

为此你需要做

line = file.gets # remove this.
while (line = file.gets) # keep only this.

line = file.gets(在while循环之前)未被处理。在进入while循环之前,行数据丢失了。

<强>更新

string = <<_
[11/Mar/2014:00:15:02 +0100] "GET /web/show/id=568296 HTTP/1.1" 200 8499 "https://www.site.com/csc/default.aspx?sid=ertett4353452445.orker2&username=username1&timestamp=20140311001443&hashkey=847823786547385243678&" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_2) AppleWebKit/537.74.9 (KHTML, like Gecko) Version/7.0.2 Safari/537.74.9" 52345 1FD323C0D681D2F10AE789F8A6C0900D.wm9worker5
[11/Mar/2014:00:35:50 +0100] "GET /web/show/id=568296 HTTP/1.1" 200 8499 "https://www.site.com/csc/default.aspx?sid=gfdgdfdgfgdfdfg._worker1&username=username2&timestamp=20140311003517&hashkey=fdsfsdffsffds&" "Mozilla/5.0 (iPad; CPU OS 7_0_6 like Mac OS X) AppleWebKit/537.51.1 (KHTML, like Gecko) CriOS/33.0.1750.14 Mobile/11B651 Safari/9537.53" 62415 5852920B165D2E39559241BA8B5FB36A.wm9worker6
[11/Mar/2014:00:35:50 +0100] "GET /web/show/id=568296 HTTP/1.1" 200 8499 "https://www.site.com/csc/default.aspx?sid=gfdgdfdgfgdfdfg._worker1&username=username2&timestamp=20140311003517&hashkey=fdsfsdffsffds&" "Mozilla/5.0 (iPad; CPU OS 7_0_6 like Mac OS X) AppleWebKit/537.51.1 (KHTML, like Gecko) CriOS/33.0.1750.14 Mobile/11B651 Safari/9537.53" 62415 5852920B165D2E39559241BA8B5FB36A.wm9worker6
_

File.write('f1',string)

@usernames = []
File.foreach('f1') do |line|
  #collect all the usernames
  @usernames << line[/username=(\w+)/,1]
  # do other tasks with *line*
end

@usernames # => ["username1", "username2", "username2"]
# to get the uniq usernames
@usernames.uniq # => ["username1", "username2"]
# if you want to see, which username present how many times, think something
# like below
Hash[@usernames.group_by { |s| s }.map { |k,v| [k,v.size]}]
# => {"username1"=>1, "username2"=>2}

查看方法IO::foreach以了解我使用它的原因。还可以查看Array#uniqgroup_by方法。这些文件很清楚。

答案 1 :(得分:1)

首先,IO类和扩展名File有一个each方法,可以生成到块的行。还有一个foreach类方法,使其更简洁。

File.foreach 'text.txt' do |line|
  # Count stuff ...
end

关于你的第一个问题,这是因为你将第一行读入一个变量,然后在while循环子句之后立即覆盖所述变量。这有效地跳过了第一行。上面的例子摆脱了这个问题。

如果不查看我们正在处理的输入,就很难回答第二个问题。


基于String#scan的简单解决方案可能就足够了:

line.scan /[?&]username=([^&]*)/ do |user_name|
  puts user_name
end

因此可以将所有内容简化为:

user_names = File.foreach('text.txt').map do |line|
  line.scan /[?&]username=([^&]*)/
end.flatten

user_name_counts = user_names.uniq.inject Hash.new do |hash, user_name|
  hash.tap do |hash|
    hash[user_name] = user_names.count user_name
  end
end

p user_name_counts
# => {"username1"=>1, "username2"=>2}