从电子邮件地址列表中选择最常见的域

时间:2017-01-27 12:11:02

标签: ruby

我有一个生成随机电子邮件地址的功能:

def emails
    names = ["alfred", "daniel", "elisa", "ana", "ramzes"]
    surnames = ["oak", "leaf", "grass", "fruit"]
    providers = ["gmail", "yahoo", "outlook", "icloud"]
    address = "#{names.sample}.#{surnames.sample}#{rand(100..5300)}@#{providers.sample}.com"
end

给出随机生成的电子邮件地址列表:

email_list = 100.times.map { emails }

看起来像这样:

daniel.oak3985@icloud.com
ramzes.grass1166@icloud.com
daniel.fruit992@yahoo.com
...

如何选择最常见的提供商(“gmail”,“yahoo”等)?

2 个答案:

答案 0 :(得分:2)

您的问题与此one类似。但是有一个转折点:你不想分析电子邮件地址的频率,而是分析它们的提供者。

def random_email
  names = ["alfred", "daniel", "elisa", "ana", "ramzes"]
  surnames = ["oak", "leaf", "grass", "fruit"]
  providers = ["gmail", "yahoo", "outlook", "icloud"]
  address = "#{names.sample}.#{surnames.sample}#{rand(100..5300)}@#{providers.sample}.com"
end

emails = Array.new(100){ random_email }

freq = emails.each_with_object(Hash.new(0)) do |email,freq|
  provider = email.split('@').last
  freq[provider] += 1
end

p freq
#=> {"outlook.com"=>24, "yahoo.com"=>28, "gmail.com"=>32, "icloud.com"=>16}

p freq.max_by{|provider, count| count}.first
#=> "gmail.com"

答案 1 :(得分:0)

email_list = 10.times.map { emails }
  #=> ["alfred.grass426@gmail.com", "elisa.oak239@icloud.com",
  #    "daniel.fruit1600@outlook.com", "ana.fruit3761@icloud.com",
  #    "daniel.grass742@yahoo.com", "elisa.oak3891@outlook.com",
  #    "alfred.leaf1321@gmail.com", "alfred.grass5295@outlook.com",
  #    "ramzes.fruit435@gmail.com", "ana.fruit4233@yahoo.com"] 

email_list.group_by { |s| s[/@\K.+/] }.max_by { |_,v| v.size }.first
  #=> "gmail.com"
正则表达式中的

\K意味着忽略到目前为止匹配的所有内容。或者,@\K可以替换为正面的后方(?<=@)

步骤如下。

h = email_list.group_by { |s| s[/@\K.+/] }
  #=> {"gmail.com"  =>["alfred.grass426@gmail.com", "alfred.leaf1321@gmail.com",
  #                    "ramzes.fruit435@gmail.com"],
  #    "icloud.com" =>["elisa.oak239@icloud.com", "ana.fruit3761@icloud.com"],
  #    "outlook.com"=>["daniel.fruit1600@outlook.com",  "elisa.oak3891@outlook.com",
  #                    "alfred.grass5295@outlook.com"],
  #    "yahoo.com"  =>["daniel.grass742@yahoo.com", "ana.fruit4233@yahoo.com"]}
a = h.max_by { |_,v| v.size }
  #=> ["gmail.com", ["alfred.grass426@gmail.com", "alfred.leaf1321@gmail.com",
  #                  "ramzes.fruit435@gmail.com"]] 
a.first
  #=> "gmail.com" 

如果像这里最常见的那样,请按照以下方式修改代码以获得所有获奖者。

h = email_list.group_by { |s| s[/@\K.+/] }
  # (same as above)
mx_size = h.map { |_,v| v.size }.max
  #=> 3 
h.select { |_,v| v.size == mx_size }.keys
  #=> ["gmail.com", "outlook.com"]