我是Ruby的新手。我有一系列数组,每个数组有两个字符串:
["[[\"Wayfair \", \"57\"]]", "[[\"Move24 \", \"26\"]]",
"[[\"GetYourGuide \", \"25\"]]", "[[\"Visual Meta \", \"22\"]]",
"[[\"FinLeap \", \"20\"]]", "[[\"Movinga \", \"20\"]]",
"[[\"DCMN \", \"19\"]]", ...
我正在尝试将每个数组的数字转换为整数,但我得到的东西比我期望的还要多:
companies = companies.map do |company|
c = company[0].scan(/(.+)\((\d+)\)/).inspect
[c[0], c[1].to_i]
end
提出:
["[", 0], ["[", 0], ["[", 0], ["[", 0], ["[", 0], ["[", 0],
["[", 0], ["[", 0], ["[", 0], ["[", 0], ["[", 0]]
我期待:
["Wayfair", 57], ["Move24", 26], ["GetYourGuide", 25], ...
请帮帮忙?
完整代码:
require 'net/http'
require 'uri'
uri = URI('http://berlinstartupjobs.com/') #URI takes just one url
req = Net::HTTP::Get.new(uri) #get in URI
req['User-Agent'] = 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.116 Safari/537.36' #use this header
res = Net::HTTP.start(uri.hostname, uri.port) {|http| http.request(req)} # URI documentation
puts res.code #status code
puts res.body
puts res.body.scan('<a href="http://berlinstartupjobs.com/companies/') #scan in the body of the document files that match a href=...
puts res.body.scan(/<a href="http:\/\/berlinstartupjobs\.com\/companies\/[^\s]+ class="tag-link">(.*)<\/a>/) #scan
companies = res.body.scan(/<a href="http:\/\/berlinstartupjobs\.com\/companies\/[^\s]+ class="tag-link">(.*)<\/a>/)
companies = companies.map do |company|
c = company[0].scan(/(.+)\((\d+)\)/).inspect
[c[0], c[1].to_i]
end # do ... end = { }
puts companies.inspect
答案 0 :(得分:1)
您可以使用Enumerable#map
&amp;使用JSON.parse
解析每个元素:
require 'json'
companies.map { |elem| key, val = JSON.parse(elem).flatten; [k.strip, v.to_i] }
您也可以使用JSON.parse
代替eval
,但使用eval
被视为不良做法。
答案 1 :(得分:1)
arr = ["[[\"Wayfair \", \"57\"]]", "[[\"Move24 \", \"26\"]]"]
result = arr.collect{|e| JSON.parse(e)[0].map{|name, value| [name.strip, value.to_i]}}
OUTPUT:
[[Wayfair, 57], [Move24", 26]]
答案 2 :(得分:1)
你的代码基本上没问题。只需将.inspect
放在最后。它返回一个字符串,而不是数组。
# this is what you get from the scraping.
companies = [["Wayfair (57)"], ["Move24 (26)"], ["GetYourGuide (25)"]]
companies = companies.flatten.map do |company|
c = company.scan(/(.+)\((\d+)\)/).flatten
[c[0], c[1].to_i]
end
p companies
# >> [["Wayfair ", 57], ["Move24 ", 26], ["GetYourGuide ", 25], ...]