如何对Apache源档案名称(字符串)进行排序。 我尝试使用Jsoup下面的代码,但不返回给定的预期结果。 如何解决这个问题?
public static void getApacheArchives() throws IOException{
String url = "https://archive.apache.org/dist/httpd/"; // or whatever goes here
Document document = Jsoup.connect(url).followRedirects(false).timeout(60000/*wait up to 60 sec for response*/).get();
Elements anchors = document.body().getAllElements().select("a");
Collections.sort(anchors, new Comparator<Element>() {
@Override
public int compare(Element e1, Element e2) {
return e1.text().compareTo(e2.text());
}
});
for (int i = 0; i < anchors.size(); i++) {
Element a = anchors.get(i);
if (
( a.text().matches( "(apache_)[1].[0-9].[0-9]{1,2}.(tar.gz)" ) )
||
( a.text().matches( "(httpd-)[0-9]{1,2}.[0-9]{1,2}.[0-9]{1,2}.(tar.gz)") )
){
System.out.println(a.text());
}
}
}
此代码返回以下结果:
... ...
httpd-2.3.6.tar.gz
httpd-2.3.8.tar.gz
httpd-2.4.1.tar.gz
httpd-2.4.10.tar.gz
httpd-2.4.12.tar.gz
httpd-2.4.16.tar.gz
httpd-2.4.17.tar.gz
httpd-2.4.18.tar.gz
httpd-2.4.2.tar.gz
httpd-2.4.20.tar.gz
httpd-2.4.3.tar.gz
httpd-2.4.4.tar.gz
httpd-2.4.6.tar.gz
httpd-2.4.7.tar.gz
httpd-2.4.9.tar.gz
...
但预期结果如下:
... ...
httpd-2.3.6.tar.gz
httpd-2.3.8.tar.gz
httpd-2.4.1.tar.gz
httpd-2.4.2.tar.gz
httpd-2.4.3.tar.gz
httpd-2.4.4.tar.gz
httpd-2.4.6.tar.gz
httpd-2.4.7.tar.gz
httpd-2.4.9.tar.gz
httpd-2.4.10.tar.gz
httpd-2.4.12.tar.gz
httpd-2.4.16.tar.gz
httpd-2.4.17.tar.gz
httpd-2.4.18.tar.gz
httpd-2.4.20.tar.gz
...
答案 0 :(得分:0)
谢谢Tom,我找到了解决问题的方法! Sorting Strings that contains number in Java
一些修改:
Collections.sort(anchors, new Comparator<Element>() {
public int compare(Element o1, Element o2) {
return extractInt(o1.text()) - extractInt(o2.text());
}
int extractInt(String s) {
String num = s.replaceAll("\\D", "");
// return 0 if no digits found
return num.isEmpty() ? 0 : Integer.parseInt(num);
}
});