如何优化这个ruby脚本?

时间:2015-08-20 22:47:25

标签: mysql ruby optimization

我有这段代码,需要很长时间。

当我使用-r profile时,它表示大部分时间似乎都转到mysql ...我怎样才能加快速度呢? MySQL批量插入?

Profiler输出位于:http://pastebin.com/fH51ZeEB

代码:

#!/usr/bin/env ruby

require 'mysql'
require 'open-uri'
require 'nokogiri'
begin
i=0
src = Mysql.new 'localhost', 'me', 'pass', 'db'
rs = src.query("SELECT * FROM npanxx")
rs.each_hash do |row|
  doc = Nokogiri::XML(open("http://localcallingguide.com/xmllocalprefix.php?npa="<< row["npa"].to_s << "&nxx=" << row["nxx"].to_s << "&dir=1"))
  lca = Hash.new
  doc.xpath("//prefix/npa | //prefix/nxx | //prefix/exch").each do |prefix|
    if !lca.has_key? "npa"
      lca["npa"] = prefix.content 
      next
    end
    if !lca.has_key? "nxx"
      lca["nxx"] = prefix.content 
      next
    end
    if !lca.has_key? "exch"
      lca["exch"] = prefix.content 
      src.query("INSERT INTO npanxxlca (npa,nxx,tnpa,tnxx,texch) VALUES (#{row['npa']}, #{row['nxx']}, #{lca['npa']}, #{lca['nxx']}, #{lca['exch']})")
      lca = Hash.new
    end
  end
  puts (i+=1).to_s << "- #{row['npa']}, #{row['nxx']}\n"
end
rescue Mysql::Error => e
    puts e.errno
    puts e.error
ensure
    src.close if src
end

2 个答案:

答案 0 :(得分:2)

您可以尝试插入多行,我认为这是瓶颈。首先,您可以将值保留在数组中,当数组足够大时,然后插入多行,就像这样。

INSERT INTO tbl_name (a,b,c) VALUES(1,2,3),(4,5,6),(7,8,9);

查看how-to-insert-multiple-records-into-database

答案 1 :(得分:1)

TyphoeusHydra一起使用,您可以requests in parallel。它允许设置自定义max concurrency(默认为200) 而不是使用Nokogiri解析XML并多次按XPath搜索值并每次存储到新哈希中,您只需使用{{XML直接解析为哈希对象3}}:

require 'benchmark'
require 'typhoeus'
require 'mysql'
require 'crack'
require 'json'

BASE_URL ||= 'http://localcallingguide.com/xmllocalprefix.php'.freeze

HOST     ||= 'localhost'.freeze
USER     ||= 'me'.freeze
PASSWORD ||= 'pass'.freeze
DATABASE ||= 'db'.freeze

#
# Build lca request based on provided npa and nxx
# @param [Integer, String] npa - NPA
# @param [Integer, String] nxx - NXX
# @return [Typhoeus::Request] - request object
def lca_request(npa, nxx)
  Typhoeus::Request.new(BASE_URL, params: { dir: 1, npa: npa, nxx: nxx })
end

#
# Convert XML string into Hash object
# @param [String] xml - XML string to convert
# @return [Hash] Ruby Hash object converted from XML string
def xml_to_hash(xml)
  Crack::XML.parse(xml)
end

#
# Fetch lca_data from Hash response
# Response with error will be converted to empty array
# @param [Hash] hash - response
# @return [Array] lca data from response. Empty array if invalid data provided
def lca_data(hash)
  data = hash['root']['lca_data']['prefix']
  data.is_a? Hash ? [data] : Array(data)
rescue NoMethodError
  []
end

#
# Fetch lca_data from XML string (see #lca_data)
# @param [String] xml - string from where to fetch lca_data
# @return [Array] lca data from response.  Empty array if invalid data providede
def lca_data_from_xml(xml)
  lca_data(xml_to_hash(xml))
end

# Main function
def main
  src   = Mysql.new(HOST, USER, PASSWORD, DATABASE)
  rs    = src.query('SELECT * FROM npanxx')
  hydra = Typhoeus::Hydra.new
  rs.each_hash do |row|
    npa, nxx = row['npa'], row['nxx']
    request  = lca_request(npa, nxx)
    request.on_complete do |response|
      lca_data = lca_data_from_xml(response.body)
      lca_data.each do |lca|
        src.query("INSERT INTO npanxxlca (npa,nxx,tnpa,tnxx,texch) VALUES (#{npa}, #{nxx}, #{lca['npa']}, #{lca['nxx']}, #{lca['exch']})")
      end
    end
    hydra.queue(request)
  end
  hydra.run
end

puts Benchmark.measure { main }.real

我没有使用MySQL的经验,因此我无法推荐如何优化该部分。