Question

如果我有以下数据

First Name =>, Last Name, Age, Income, Household Size, Gender, Education
Jon, Smith, 25, 50000, 1, Male, College
Jane, Davies, 30, 60000, 3, Female, High School
Sam, Farelly, 32, 80000, 2, Unspecified, College
Joan, Favreau, 35, 65000, 4, Female, College
Sam, McNulty, 38, 63000, 3, Male, College
Mark, Minahan, 48, 78000, 5, Male, High School
Susan, Umani, 45, 75000, 2, Female, College
Bill, Perault, 24, 45000, 1, Male, Did Not Complete High School
Doug, Stamper, 45, 75000, 1, Male, College
Francis, Underwood, 52, 100000, 2, Male, College

我想创建一个哈希数组来回答以下问题平均年龄平均收入平均家庭规模女性百分比男性百分比未指明的性别百分比获得大学教育水平的人的百分比获得高中教育水平的人的百分比没有完成高中学业的百分比

我是否能够以这种方式组织数据

voter_demographics = [
  {
    :firstname => ["Jon", "Jane", "Sam", "Joan", "Sam", "Mark", "Susan", "Bill", "Doug", "Francis"],
    :lastname => ["Smith", "Davies", "Farelly", "Favreau", "McNulty", "Minahan", "Umani", "Perault", "Stamper", "Underwood"],
    :age => [25, 30, 32, 35, 38, 48, 45, 24, 45, 52],
    :income => [50000, 60000, 80000, 65000, 63000, 78000, 75000, 45000, 75000, 100000],
    :household_size => [1, 3, 2, 4, 3, 5, 2, 1, 1, 2],
    :gender => ["male", "female", "unspecified", "female", "male", "male", "female", "male", "male", "male"],
    :education => ["college", "high school", "college", "college", "college", "high school", "college", "did not complete high school", "college", "college"]
  }
]

如果有人可以帮我开始提问（平均年龄）。我仍在努力争取哈希以及如何调出每个数据。

For the first question ; Find Average age. Would the following work.
sum = 0
voter_demographics.each do |:age|
sum = sum + :age 
average = sum / :age.length
puts "The average is #{average}".

我坚持这个。如果有任何初学者资源，你可以推荐哈希和哈希数组，这将是伟大的！

Answer 1

keys, *data =<<_.split(/\n/).map { |line| line.split /,\s+/ }
First Name, Last Name, Age, Income, Household Size, Gender, Education
Jon, Smith, 25, 50000, 1, Male, College
Jane, Davies, 30, 60000, 3, Female, High School
Sam, Farelly, 32, 80000, 2, Unspecified, College
Joan, Favreau, 35, 65000, 4, Female, College
Sam, McNulty, 38, 63000, 3, Male, College
Mark, Minahan, 48, 78000, 5, Male, High School
Susan, Umani, 45, 75000, 2, Female, College
Bill, Perault, 24, 45000, 1, Male, Did Not Complete High School
Doug, Stamper, 45, 75000, 1, Male, College
Francis, Underwood, 52, 100000, 2, Male, College
_

我们现在拥有keys和data的以下值。

keys
  #=> ["First Name", "Last Name", "Age", "Income", "Household Size",
  #    "Gender", "Education"] 
data
  #=> [["Jon", "Smith", "25", "50000", "1", "Male", "College"],
  #    ["Jane", "Davies", "30", "60000", "3", "Female", "High School"],
  #    ["Sam", "Farelly", "32", "80000", "2", "Unspecified", "College"],
  #    ["Joan", "Favreau", "35", "65000", "4", "Female", "College"],
  #    ["Sam", "McNulty", "38", "63000", "3", "Male", "College"],
  #    ["Mark", "Minahan", "48", "78000", "5", "Male", "High School"],
  #    ["Susan", "Umani", "45", "75000", "2", "Female", "College"],
  #    ["Bill", "Perault", "24", "45000", "1", "Male", "Did Not Complete High School"],
  #    ["Doug", "Stamper", "45", "75000", "1", "Male", "College"],
  #    ["Francis", "Underwood", "52", "100000", "2", "Male", "College"]]

接下来创建以下哈希。

h = keys.zip(data.transpose).to_h
  #=> {"First Name"    =>["Jon", "Jane", "Sam", "Joan", "Sam", "Mark", "Susan",
  #                       "Bill", "Doug", "Francis"],
  #    "Last Name"     =>["Smith", "Davies", "Farelly", "Favreau", "McNulty", "Minahan",
  #                       "Umani", "Perault", "Stamper", "Underwood"],
  #    "Age"           =>["25", "30", "32", "35", "38", "48", "45", "24", "45", "52"],
  #    "Income"        =>["50000", "60000", "80000", "65000", "63000", "78000",
  #                      "75000", "45000", "75000", "100000"],
  #    "Household Size"=>["1", "3", "2", "4", "3", "5", "2", "1", "1", "2"],
  #    "Gender"        =>["Male", "Female", "Unspecified", "Female", "Male", "Male",
  #                       "Female", "Male", "Male", "Male"],
  #    "Education"     =>["College", "High School", "College", "College", "College",
  #                       "High School", "College", "Did Not Complete High School",
  #                       "College", "College"]}

现在可以很容易地计算各种统计数据。

n = arr.size.to_f
  #=> 10.0

avg_age = h["Age"].map(&:to_i).reduce(:+)/n.to_f
  #=> 37.4 
avg_income = h["Income"].map(&:to_i).reduce(:+)/n.to_f
  #=> 69100.0 
avg_hsize = h["Household Size"].map(&:to_i).reduce(:+)/n.to_f
  #=> 2.4 
pct_female= 100*h["Gender"].count("Female")/n.to_f
  #=> 30.0

等等。

计算其他统计信息

现在假设您希望计算涉及多个键的统计信息，例如女性的平均年龄。最简单的方法（以及计算简单平均值和百分比）是将数据放入数据库并使用SQL查询。但是，我们也可以首先创建一个哈希数组。

arr = data.map { |row| keys.zip(row).to_h }
  #=> [{"First Name"=>"Jon", "Last Name"=>"Smith", "Age"=>"25", "Income"=>"50000",
  #     "Household Size"=>"1", "Gender"=>"Male", "Education"=>"College"},
  #    {"First Name"=>"Jane", "Last Name"=>"Davies", "Age"=>"30", "Income"=>"60000",
  #     "Household Size"=>"3", "Gender"=>"Female", "Education"=>"High School"},
  #    {"First Name"=>"Sam", "Last Name"=>"Farelly", "Age"=>"32", "Income"=>"80000",
  #     "Household Size"=>"2", "Gender"=>"Unspecified", "Education"=>"College"},
  #    {"First Name"=>"Joan", "Last Name"=>"Favreau", "Age"=>"35", "Income"=>"65000",
  #     "Household Size"=>"4", "Gender"=>"Female", "Education"=>"College"},
  #    {"First Name"=>"Sam", "Last Name"=>"McNulty", "Age"=>"38", "Income"=>"63000",
  #     "Household Size"=>"3", "Gender"=>"Male", "Education"=>"College"},
  #    {"First Name"=>"Mark", "Last Name"=>"Minahan", "Age"=>"48", "Income"=>"78000",
  #     "Household Size"=>"5", "Gender"=>"Male", "Education"=>"High School"},
  #    {"First Name"=>"Susan", "Last Name"=>"Umani", "Age"=>"45", "Income"=>"75000",
  #     "Household Size"=>"2", "Gender"=>"Female", "Education"=>"College"},
  #    {"First Name"=>"Bill", "Last Name"=>"Perault", "Age"=>"24", "Income"=>"45000",
  #     "Household Size"=>"1", "Gender"=>"Male",
  #     "Education"=>"Did Not Complete High School"},
  #    {"First Name"=>"Doug", "Last Name"=>"Stamper", "Age"=>"45", "Income"=>"75000",
  #     "Household Size"=>"1", "Gender"=>"Male", "Education"=>"College"},
  #    {"First Name"=>"Francis", "Last Name"=>"Underwood", "Age"=>"52",
  #     "Income"=>"100000", "Household Size"=>"2", "Gender"=>"Male",
  #     "Education"=>"College"}]

然后计算女性的平均年龄，为女性创建一个年龄数组，然后对其元素求和，并将该总和除以数组的大小。

a = arr.each_with_object([]) { |h,a| a << h["Age"].to_i if h["Gender"]=="Female" }
  #=> [30, 35, 45]
a.empty? ? 0.0 : a.reduce(:+)/a.size.to_f
  #=> 36.666666666666664

其他计算方法类似。

Answer 2

您的voter_demographics是一个只有一个元素的数组

voter_demographics[0]
=> {:firstname=>["Jon", "Jane", "Sam", "Joan", "Sam", "Mark", "Susan", "Bill", "Doug", "Francis"], :lastname=>["Smith", "Davies", "Farelly", "Favreau", "McNulty", "Minahan", "Umani", "Perault", "Stamper", "Underwood"], :age=>[25, 30, 32, 35, 38, 48, 45, 24, 45, 52], :income=>[50000, 60000, 80000, 65000, 63000, 78000, 75000, 45000, 75000, 100000], :household_size=>[1, 3, 2, 4, 3, 5, 2, 1, 1, 2], :gender=>["male", "female", "unspecified", "female", "male", "male", "female", "male", "male", "male"], :education=>["college", "high school", "college", "college", "college", "high school", "college", "did not complete high school", "college", "college"]}
voter_demographics[1]
=> nil

所以，让我们取第一个元素并获取年龄数据：

age_data = voter_demographics[0][:age]
=> [25, 30, 32, 35, 38, 48, 45, 24, 45, 52]

现在，我们可以汇总数据

sum = 0
age_data.each { |e| sum = sum + e }
sum
=> 374

或只是

age_data.inject(:+)
=> 374

您还可以获得元素数量

age_data.size
=> 10

最后

age_data.inject(:+)/age_data.size
=> 37

我希望它可以帮助你理解它;）

Answer 3

我想知道为什么你有voter_demographics数组。如果删除方括号，那么它只是一个哈希，如下所示：

voter_demographics = {
  :firstname => ["Jon", "Jane", "Sam", "Joan", "Sam", "Mark", "Susan", "Bill", "Doug", "Francis"],
  :lastname => ["Smith", "Davies", "Farelly", "Favreau", "McNulty", "Minahan", "Umani", "Perault", "Stamper", "Underwood"],
  :age => [25, 30, 32, 35, 38, 48, 45, 24, 45, 52],
  :income => [50000, 60000, 80000, 65000, 63000, 78000, 75000, 45000, 75000, 100000],
  :household_size => [1, 3, 2, 4, 3, 5, 2, 1, 1, 2],
  :gender => ["male", "female", "unspecified", "female", "male", "male", "female", "male", "male", "male"],
  :education => ["college", "high school", "college", "college", "college", "high school", "college", "did not complete high school", "college", "college"]
}

然后你可以像这样访问哈希值：

voter_demographics[:age]

找到你的平均值就会这样：

voter_demographics[:age].inject(:+).to_f / voter_demographics[:age].size

或

ages = voter_demographics[:age]
ages.inject(:+).to_f / ages.size

.to_f假设你想要一个浮动数字，所以我的解决方案会给你 37.4 而不是 37

另一个解决方案是使用常规变量而不是哈希：

firstnames = ["Jon", "Jane", "Sam", "Joan", "Sam", "Mark", "Susan", "Bill", "Doug", "Francis"],
lastnames = ["Smith", "Davies", "Farelly", "Favreau", "McNulty", "Minahan", "Umani", "Perault", "Stamper", "Underwood"],
ages = [25, 30, 32, 35, 38, 48, 45, 24, 45, 52],
incomes = [50000, 60000, 80000, 65000, 63000, 78000, 75000, 45000, 75000, 100000],
household_sizes = [1, 3, 2, 4, 3, 5, 2, 1, 1, 2],
genders = ["male", "female", "unspecified", "female", "male", "male", "female", "male", "male", "male"],
educations = ["college", "high school", "college", "college", "college", "high school", "college", "did not complete high school", "college", "college"]

然后这会起作用：

ages.inject(:+).to_f / ages.size

Answer 4

要回答问题的第一部分，如何将数据组织到voter_demographics，您可以在ruby中使用csv解析器。因此，以file.csv格式保存数据并解析它。

require 'csv'

csv_data = CSV.parse(File.read('file1.csv'), headers: true, header_converters: :symbol)

data_array = csv_data.map {|arr| arr.to_h}

现在，我们有一系列哈希，比如

[{:first_name=>"Jon", :last_name=>" Smith", :age=>" 25", :income=>" 50000", :household_size=>" 1", :gender=>" Male", :education=>" College"},
{:first_name=>"Jane", :last_name=>" Davies", :age=>" 30", :income=>" 60000", :household_size=>" 3", :gender=>" Female", :education=>" High School"},
...]

现在，我们可以编写一些内容来将这些数据格式化为您想要的格式。

result = {}

data_array[0].each do |k, v|
    result[k] = data_array.map {|hash| hash[k].strip }   
end

其他答案已经回答了你问题的第二部分，所以我不会在这里再做一遍。

一系列哈希问题

4 个答案: