出于某种原因,使用Jsoup.parse在kitkat设备上花费的时间比旧设备多10倍,起初我认为它与ART运行时有关,但是改回dalvik没有帮助
以下是我正在使用的代码:
downloadedHtml = NetworkHelper.downloadString("https://en.m.wikipedia.org/wiki/Dusseldorf");
AppLog.i("Downloaded data, Jsoup is parsing the html");
hDoc = Jsoup.parse(downloadedHtml);
Element htmlElement = hDoc.select("html").first();
String langCode = htmlElement.attributes().get("lang");
ArticleInfo articleInfo = new ArticleInfo(getWikiLanguage(langCode), langCode, href);
article = new Article(articleInfo, href);
String title = hDoc.getElementById("section_0").text();
article.set_title(title);
Document documentNode = hDoc.ownerDocument();
Elements contents = documentNode.getElementsByClass("content");
if (contents == null || contents.isEmpty())
throw new IllegalArgumentException("content");
Element content = contents.first();
Elements imgElements = content.select("img");
Element htmlNode;
for (int i = 0; i < imgElements.size(); i++)
{
htmlNode = imgElements.get(i);
if (!htmlNode.hasAttr("src"))
continue;
String src = htmlNode.attr("src");
if (src.startsWith("//"))
htmlNode.attr("src", String.format("http:%s", src));
//else
//throw new UnsupportedOperationException();
}
//get section headings
Elements headlines = documentNode.getElementsByClass("mw-headline");
if (headlines != null)
{
Element headline;
for (int i = 0; i < headlines.size(); i++)
{
headline = headlines.get(i);
String headline_link = headline.id();
String headline_title = headline.text();
SectionHeadline sectionHeadline = new SectionHeadline(headline_title, headline_link);
article.get_sectionHeadlines().add(sectionHeadline);
}
}
article.set_html(content.outerHtml());
//get languages
//language list
Element languageSection = content.getElementById("mw-mf-language-section");
if (languageSection != null)
{
Elements languageLinks = languageSection.select("li");
Element languageLink;
for (int i = 0; i < languageLinks.size(); i++)
{
languageLink = languageLinks.get(i);
Element link = null;
Elements ls = languageLink.select("a");
if (ls == null || ls.size() == 0)
continue;
link = ls.first();
if (!link.hasAttr("href"))
continue;
String linkHref = link.attr("href");
if (linkHref != null && link.text() != null)
{
String languageCode = link.attr("lang");
if (linkHref.startsWith("//"))
linkHref = String.format("http:%s", linkHref);
ArticleInfo languageInfo = new ArticleInfo(getWikiLanguage(languageCode), languageCode, linkHref);
if (languageInfo.get_language() == "Unknown")
continue;
article.get_languages().add(languageInfo);
}
}
}
任何想法可能是什么问题?
答案 0 :(得分:0)
问题中的代码选择文档的一部分,将其保存到变量,选择该变量的一部分,将其保存到新变量,等等。另一种可能的实现是更多地使用selector syntax来仅选择所需的元素,而不是将这些中间步骤保存在新对象中。
以下代码在我的机器上执行2秒钟。上述类似的摘录在约4秒内执行。随后的时间更接近,差异大约50毫秒,所以拿一粒盐。
我不知道kitkat是否存在性能问题。您可能会发现在kitkat和dalvik版本中添加计时器有助于隔离性能瓶颈的存在和位置。
这是我的代码:
long start = System.currentTimeMillis();
Document hDoc = Jsoup.
connect("https://en.m.wikipedia.org/wiki/Dusseldorf").
userAgent("Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1312.57 Safari/537.17").
get();
//select the first html element, then take the value of the lang attribute
String langCode = hDoc.select("html:eq(0)").attr("lang");
String title = hDoc.getElementById("section_0").text();
Document documentNode = hDoc.ownerDocument();
//select all the image elements having the attribute src which are
//descended from the first element with the content class
Elements imgElementsHavingSrcAttr = documentNode.select("*.content:eq(0) img[src]");
Element htmlNode;
//for each img element
for (Element img : imgElementsHavingSrcAttr)
{
htmlNode = img;
String src = img.attr("src");
if (src.startsWith("//"))
{
htmlNode.attr("src", String.format("http:%s", src));
}
}
System.out.println("Function took " + (System.currentTimeMillis()-start) + "ms");