请帮我弄清楚如何正确分配带日期的Build name,然后按上传日期按升序排序所有链接。
Index.html的示例如下所示:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<html>
<head><title>Index of localhost/BUILD</title>
</head>
<body>
<h1>Index of localhost/BUILD</h1>
<pre>Name Last modified Size</pre><hr/>
<pre><a href="../">../</a>
<a href="BUILD.10.tar">BUILD.10.tar</a> 27-Sep-2017 15:46 250 bytes
<a href="BUILD.13.tar">BUILD.13.tar</a> 28-Sep-2017 12:14 254 bytes
<a href="BUILD.15.tar">BUILD.15.tar</a> 29-Sep-2017 08:56 257 bytes
<a href="BUILD.16.tar">BUILD.16.tar</a> 29-Sep-2017 08:56 258 bytes
<a href="BUILD.17.tar">BUILD.17.tar</a> 29-Sep-2017 08:56 256 bytes
<a href="BUILD.9.tar">BUILD.9.tar</a> 27-Sep-2017 15:44 247 bytes
</pre>
<hr/><address style="font-size:small;">Artifactory/5.2.1 Server</address></body></html>
目前我的脚本如下所示:
require 'open-uri'
require 'nokogiri'
build_url = "/home/index.html"
index_html = open(build_url).read
index_dom = Nokogiri::HTML.parse index_html
builds =[]
links = index_dom.css('a').each { |link|
build = link.text
if build.end_with?(".tar")
builds.push(build)
end
}
rc_builds = []
builds.sort.each { |b| rc_builds << b }
p rc_builds
需要更改此内容以获取Build name和Last modified,并输出rc_builds数组,按上次修改后的升序排序。
无法对index.html进行任何更改。所以解决方案应该基于示例中的index.html页面。
问题是我无法弄清楚如何访问Last Modified text。
答案 0 :(得分:1)
您可以尝试获取anchor
个标签及其旁边的文字。
index_dom = Nokogiri::HTML.parse(html)
# Access the pre tags within the parsed html
builds = index_dom.css('pre').flat_map do |link|
# Scan for looking the modified at dates
text = link.text.scan(/\d+-\w+-\d{4} \d{2}:\d{2}/) # I'm not regex expert, I'm sure this could be better
# Get all the anchors within the current pre tag
link.css('a').map.with_index do |anchor, index|
href = anchor['href']
# Select the text by the anchor's side if this ends in 'tar'
[text[index - 1], href] if href.end_with?('.tar')
end.compact
# Compact for removing nil due to the interaction with the first pre tag
end
# Sorts the array of arrays by its first value, that's the date
p builds.sort_by(&:first)
# => [["27-Sep-2017 15:46", "BUILD.10.tar"], ["28-Sep-2017 12:14", "BUILD.13.tar"]]
答案 1 :(得分:1)
我就是这样做的:
dom = Nokogiri::HTML.parse index_html
builds =[]
pre = dom.css('pre')
build_info = pre[1].text
result = []
build_info.split("\n").each do |line|
next unless line =~ /BUILD/
arr = line.split(/\s+/)
result.push({
build: arr[0],
modified: "#{arr[1]} #{arr[2]}",
size: "#{arr[3]}",
size_unit: "#{arr[4]}"
})
end
p result
#[{:build=>"BUILD.10.tar", :modified=>"27-Sep-2017 15:46", :size=>"250", :size_unit=>"bytes"}, {:build=>"BUILD.13.tar", :modified=>"28-Sep-2017 12:14", :size=>"254", :size_unit=>"bytes"}, {:build=>"BUILD.15.tar", :modified=>"29-Sep-2017 08:56", :size=>"257", :size_unit=>"bytes"}, {:build=>"BUILD.16.tar", :modified=>"29-Sep-2017 08:56", :size=>"258", :size_unit=>"bytes"}, {:build=>"BUILD.17.tar", :modified=>"29-Sep-2017 08:56", :size=>"256", :size_unit=>"bytes"}, {:build=>"BUILD.9.tar", :modified=>"27-Sep-2017 15:44", :size=>"247", :size_unit=>"bytes"}]