我有这段代码,需要很长时间。
当我使用-r profile时,它表示大部分时间似乎都转到mysql ...我怎样才能加快速度呢? MySQL批量插入?
Profiler输出位于:http://pastebin.com/fH51ZeEB
代码:
#!/usr/bin/env ruby
require 'mysql'
require 'open-uri'
require 'nokogiri'
begin
i=0
src = Mysql.new 'localhost', 'me', 'pass', 'db'
rs = src.query("SELECT * FROM npanxx")
rs.each_hash do |row|
doc = Nokogiri::XML(open("http://localcallingguide.com/xmllocalprefix.php?npa="<< row["npa"].to_s << "&nxx=" << row["nxx"].to_s << "&dir=1"))
lca = Hash.new
doc.xpath("//prefix/npa | //prefix/nxx | //prefix/exch").each do |prefix|
if !lca.has_key? "npa"
lca["npa"] = prefix.content
next
end
if !lca.has_key? "nxx"
lca["nxx"] = prefix.content
next
end
if !lca.has_key? "exch"
lca["exch"] = prefix.content
src.query("INSERT INTO npanxxlca (npa,nxx,tnpa,tnxx,texch) VALUES (#{row['npa']}, #{row['nxx']}, #{lca['npa']}, #{lca['nxx']}, #{lca['exch']})")
lca = Hash.new
end
end
puts (i+=1).to_s << "- #{row['npa']}, #{row['nxx']}\n"
end
rescue Mysql::Error => e
puts e.errno
puts e.error
ensure
src.close if src
end
答案 0 :(得分:2)
您可以尝试插入多行,我认为这是瓶颈。首先,您可以将值保留在数组中,当数组足够大时,然后插入多行,就像这样。
INSERT INTO tbl_name (a,b,c) VALUES(1,2,3),(4,5,6),(7,8,9);
答案 1 :(得分:1)
将Typhoeus与Hydra一起使用,您可以requests in parallel。它允许设置自定义max concurrency(默认为200
)
而不是使用Nokogiri解析XML
并多次按XPath
搜索值并每次存储到新哈希中,您只需使用{{XML
直接解析为哈希对象3}}:
require 'benchmark'
require 'typhoeus'
require 'mysql'
require 'crack'
require 'json'
BASE_URL ||= 'http://localcallingguide.com/xmllocalprefix.php'.freeze
HOST ||= 'localhost'.freeze
USER ||= 'me'.freeze
PASSWORD ||= 'pass'.freeze
DATABASE ||= 'db'.freeze
#
# Build lca request based on provided npa and nxx
# @param [Integer, String] npa - NPA
# @param [Integer, String] nxx - NXX
# @return [Typhoeus::Request] - request object
def lca_request(npa, nxx)
Typhoeus::Request.new(BASE_URL, params: { dir: 1, npa: npa, nxx: nxx })
end
#
# Convert XML string into Hash object
# @param [String] xml - XML string to convert
# @return [Hash] Ruby Hash object converted from XML string
def xml_to_hash(xml)
Crack::XML.parse(xml)
end
#
# Fetch lca_data from Hash response
# Response with error will be converted to empty array
# @param [Hash] hash - response
# @return [Array] lca data from response. Empty array if invalid data provided
def lca_data(hash)
data = hash['root']['lca_data']['prefix']
data.is_a? Hash ? [data] : Array(data)
rescue NoMethodError
[]
end
#
# Fetch lca_data from XML string (see #lca_data)
# @param [String] xml - string from where to fetch lca_data
# @return [Array] lca data from response. Empty array if invalid data providede
def lca_data_from_xml(xml)
lca_data(xml_to_hash(xml))
end
# Main function
def main
src = Mysql.new(HOST, USER, PASSWORD, DATABASE)
rs = src.query('SELECT * FROM npanxx')
hydra = Typhoeus::Hydra.new
rs.each_hash do |row|
npa, nxx = row['npa'], row['nxx']
request = lca_request(npa, nxx)
request.on_complete do |response|
lca_data = lca_data_from_xml(response.body)
lca_data.each do |lca|
src.query("INSERT INTO npanxxlca (npa,nxx,tnpa,tnxx,texch) VALUES (#{npa}, #{nxx}, #{lca['npa']}, #{lca['nxx']}, #{lca['exch']})")
end
end
hydra.queue(request)
end
hydra.run
end
puts Benchmark.measure { main }.real
我没有使用MySQL
的经验,因此我无法推荐如何优化该部分。