我正在尝试解析href
中的HTML
个代码。基本上我正在尝试获取URL和描述。我还尝试按空格分割描述并计算每个单词出现的数量,最后将它们写成两个单独的文件。我的解析器工作正常,但效率非常低,我会说它会在2分钟内解析1MB的文本。
以下是我的代码:
hrefTag = "<a href=\""
qtMark = "\""
descStart = "\">"
hrefEnd = "</a>"
if line.include? hrefTag
dest = line[/#{hrefTag}(.*?)#{qtMark}/m, 1]
descStIn = line.rindex(descStart)
descEndIn = line.rindex(hrefEnd)
if (descStIn != nil && descEndIn != nil)
desc = line[(descStIn+2)..(descEndIn-1)]
end
end
if (source != "" && dest != "")
occur = Hash.new(0)
mainEntry = "original-url=\"" + source +
"\", dest-url=\"" + dest + "\""
descEntry = ""
if (desc != nil && desc != "")
descEntry = ", desc=\"" + desc + "\""
words = desc.split(' ')
words.each { |word| occur[word] += 1 }
end
firstEntry = mainEntry+descEntry+"\n\n"
File.open(firstOutput, 'a') { |file|
file.write(firstEntry)
}
occur.each { |word, occurrance|
wordEntry = ", word=\"" + word +
"\", count=" + occurrance.to_s
secondEntry = mainEntry+wordEntry+"\n\n"
File.open(secondOutput, 'a') { |file|
file.write(secondEntry)
}
}
如何提高效率?哪些部分效率最低?
答案 0 :(得分:0)
要了解花费最多时间的内容,请使用ruby-prof或类似工具对代码进行分析。安装ruby-prof:
gem install ruby-prof
运行它来调用你的脚本:
ruby-prof <script.rb>
当你的脚本完成时(或你是CTRL-C),它总结了方法调用,每种方法所花费的时间等。这是一个输出片段:
Sort by: self_time
%self total self wait child calls name
8.67 0.008 0.008 0.000 0.000 2 JSON::Ext::Parser#parse
8.45 0.022 0.008 0.000 0.014 99 IO#read_nonblock
6.66 0.006 0.006 0.000 0.000 99 <Module::Kernel>#select
2.78 0.003 0.003 0.000 0.000 235 IO#write
1.17 0.001 0.001 0.000 0.000 57 Enumerator#next
0.99 0.049 0.001 0.000 0.048 207 *Array#each