我正在尝试使用ROME API解析原子提要以提取提要。原子提要给了我内容属性,其中包含文章的图像和描述。 这是原子饲料的网址:https://news.google.com/news/section?output=atom&ned=in&q=narendra%20modi。 现在我想从内容部分中提取图像和描述。
{
"authors": [
"Microsoft"
],
"description": "Music store application on ASP.NET 5",
"version": "1.0.0-*",
"compilationOptions": { "warningsAsErrors": true, "define": [ "DEMO", "TESTING" ] },
"compile": [
"../../shared/**/*.cs"
],
"publishExclude": "*.cmd",
"webroot": "wwwroot",
"dependencies": {
"EntityFramework.InMemory": "7.0.0-*",
"EntityFramework.MicrosoftSqlServer": "7.0.0-*",
"Microsoft.AspNet.Antiforgery": "1.0.0-*",
"Microsoft.AspNet.Authentication.Cookies": "1.0.0-*",
"Microsoft.AspNet.Authentication.Facebook": "1.0.0-*",
"Microsoft.AspNet.Authentication.Google": "1.0.0-*",
"Microsoft.AspNet.Authentication.MicrosoftAccount": "1.0.0-*",
"Microsoft.AspNet.Authentication.OpenIdConnect": "1.0.0-*",
"Microsoft.AspNet.Authentication.Twitter": "1.0.0-*",
"Microsoft.AspNet.Diagnostics.Entity": "7.0.0-*",
"Microsoft.AspNet.Identity.EntityFramework": "3.0.0-*",
"Microsoft.AspNet.Mvc": "6.0.0-*",
"Microsoft.AspNet.Mvc.TagHelpers": "6.0.0-*",
"Microsoft.AspNet.Server.IIS": "1.0.0-*",
"Microsoft.AspNet.Server.Kestrel": "1.0.0-*",
"Microsoft.AspNet.Server.WebListener": "1.0.0-*",
"Microsoft.AspNet.Session": "1.0.0-*",
"Microsoft.AspNet.StaticFiles": "1.0.0-*",
"Microsoft.AspNet.Tooling.Razor": "1.0.0-*",
"Microsoft.Extensions.CodeGenerators.Mvc": "1.0.0-*",
"Microsoft.Extensions.Configuration.CommandLine": "1.0.0-*",
"Microsoft.Extensions.Configuration.EnvironmentVariables": "1.0.0-*",
"Microsoft.Extensions.Configuration.Json": "1.0.0-*",
"Microsoft.Extensions.Logging.Console": "1.0.0-*"
},
"commands": {
"gen": "Microsoft.Extensions.CodeGeneration",
"run": "run server.urls=http://localhost:5003",
"web": "Microsoft.AspNet.Server.Kestrel",
"kestrel": "Microsoft.AspNet.Hosting --server Microsoft.AspNet.Server.Kestrel --server.urls http://localhost:5004",
"weblistener": "Microsoft.AspNet.Hosting --server Microsoft.AspNet.Server.WebListener --server.urls http://localhost:5002"
},
"frameworks": {
"dnx451": { },
"dnxcore50": {
"dependencies": {
"System.Runtime.Serialization.Primitives": "4.0.10-*"
}
}
}
对于图像,我尝试了以下jsoup代码:
<entry>
<id>tag:news.google.com,2005:cluster=http://www.ndtv.com/india-news/not-just-gst-stuck-in-parliament-matter-of-sorrow-pm-narendra-modi-1253222</id>
<title type="html">'Not Just GST Stuck In Parliament. Matter of Sorrow': PM Narendra Modi - NDTV</title>
<updated>2015-12-10T06:03:54Z</updated>
<link rel="alternate" type="text/html" href="http://news.google.com/news/url?sa=t&fd=R&ct2=in&usg=AFQjCNE53SQd2skoJLxBTVlYWHdgDBCl7Q&clid=c3a7d30bb8a4878e06b80cf16b898331&cid=52779006372283&ei=ACdpVoDJO9Sj4ALYkL94&url=http://www.ndtv.com/india-news/not-just-gst-stuck-in-parliament-matter-of-sorrow-pm-narendra-modi-1253222" hreflang="en"/>
<content type="html"><table border="0" cellpadding="2" cellspacing="7" style="vertical-align:top;"><tr><td width="80" align="center" valign="top"><font style="font-size:85%;font-family:arial,sans-serif"><a href="http://news.google.com/news/url?sa=t&amp;fd=R&amp;ct2=in&amp;usg=AFQjCNE53SQd2skoJLxBTVlYWHdgDBCl7Q&amp;clid=c3a7d30bb8a4878e06b80cf16b898331&amp;cid=52779006372283&amp;ei=ACdpVoDJO9Sj4ALYkL94&amp;url=http://www.ndtv.com/india-news/not-just-gst-stuck-in-parliament-matter-of-sorrow-pm-narendra-modi-1253222"><img src="//t3.gstatic.com/images?q=tbn:ANd9GcSNi4SJFo9q9PXKPOjJkiUlfk2GFRzRoBlwK6UsiSQ8np66JDvgQiYTdN4Fknntb7bVjdR-NuM" alt="" border="1" width="80" height="80"><br><font size="-2">NDTV</font></a></font></td><td valign="top" class="j"><font style="font-size:85%;font-family:arial,sans-serif"><br><div style="padding-top:0.8em;"><img alt="" height="1" width="1"></div><div class="lh"><a href="http://news.google.com/news/url?sa=t&amp;fd=R&amp;ct2=in&amp;usg=AFQjCNE53SQd2skoJLxBTVlYWHdgDBCl7Q&amp;clid=c3a7d30bb8a4878e06b80cf16b898331&amp;cid=52779006372283&amp;ei=ACdpVoDJO9Sj4ALYkL94&amp;url=http://www.ndtv.com/india-news/not-just-gst-stuck-in-parliament-matter-of-sorrow-pm-narendra-modi-1253222"><b>&#39;Not Just GST Stuck In Parliament. Matter of Sorrow&#39;: PM <b>Narendra Modi</b></b></a><br><font size="-1"><b><font color="#6f6f6f">NDTV</font></b></font><br><font size="-1">With repeated disruptions stalling legislation including the GST or Goods and Services Tax, Prime Minister <b>Narendra Modi</b> today said it was a &quot;matter of sorrow&quot; that Parliament was not running. &quot;It is not only GST, but many pro-poor steps are stuck in&nbsp;...</font><br><font size="-1"><a href="http://news.google.com/news/url?sa=t&amp;fd=R&amp;ct2=in&amp;usg=AFQjCNEVhO7UtISsITzRIFwxTVFwK8BTDQ&amp;clid=c3a7d30bb8a4878e06b80cf16b898331&amp;cid=52779006372283&amp;ei=ACdpVoDJO9Sj4ALYkL94&amp;url=http://www.india.com/news/india/narendra-modis-stern-message-to-congress-democracy-cannot-run-on-whims-of-some-773082/"><b>Narendra Modi&#39;s</b> stern message to Congress: Democracy cannot run on whims of some</a><font size="-1" color="#6f6f6f"><nobr>India.com</nobr></font></font><br><font size="-1"><a href="http://news.google.com/news/url?sa=t&amp;fd=R&amp;ct2=in&amp;usg=AFQjCNGkBqqpn2OhEI6w68lLCIXMDppu-Q&amp;clid=c3a7d30bb8a4878e06b80cf16b898331&amp;cid=52779006372283&amp;ei=ACdpVoDJO9Sj4ALYkL94&amp;url=http://www.mid-day.com/articles/jagran-forum-catch-pm-narendra-modi-other-leaders-live/16757192">Jagran Forum: Catch PM <b>Narendra Modi</b>, other leaders live</a><font size="-1" color="#6f6f6f"><nobr>Mid-Day</nobr></font></font><br><font size="-1"><a href="http://news.google.com/news/url?sa=t&amp;fd=R&amp;ct2=in&amp;usg=AFQjCNHPkB8Wy_-cDqqZrdfcn1cVUKP-Kg&amp;clid=c3a7d30bb8a4878e06b80cf16b898331&amp;cid=52779006372283&amp;ei=ACdpVoDJO9Sj4ALYkL94&amp;url=http://www.oneindia.com/india/democracy-cant-be-restricted-to-elections-only-narendra-modi-1951641.html">Democracy can&#39;t be restricted to elections only, says <b>Narendra Modi</b></a><font size="-1" color="#6f6f6f"><nobr>Oneindia</nobr></font></font><br><font size="-1" class="p"><a href="http://news.google.com/news/url?sa=t&amp;fd=R&amp;ct2=in&amp;usg=AFQjCNFhxDKEsImpQqu0GccMt4MCiPydVw&amp;clid=c3a7d30bb8a4878e06b80cf16b898331&amp;cid=52779006372283&amp;ei=ACdpVoDJO9Sj4ALYkL94&amp;url=http://www.abplive.in/india-news/everyone-must-feel-he-or-she-is-working-for-indias-progress-says-narendra-modi-258229"><nobr>ABP Live</nobr></a></font><br><font class="p" size="-1"><a class="p" href="http://news.google.com/news/more?ncl=dac7xEJd70rfdkM8gcjOwSJn8BK9M&amp;authuser=0&amp;ned=in"><nobr><b>all 29 news articles&nbsp;&raquo;</b></nobr></a></font></div></font></td></tr></table></content>
</entry>
但它什么也没有回报。另外我不知道如何继续提取描述:
Elements img = doc.getElementsByTag("img");
for (Element el : img) {
System.out.println("Image Found!");
System.out.println("src attribute is : "+el.attr("src"));
}
请帮助我。
答案 0 :(得分:0)
试试这段代码。请注意,RSS源是直接使用Jsoup获取的。
Document news = Jsoup.connect("http://news.google.com/news/section?output=atom&ned=in&q=narendra%20modi").get();
int i=0;
for (Element entryContent : news.select("entry > content")) {
System.out.format("\n## ENTRY %d\n", ++i);
for (Element el : Jsoup.parse(entryContent.text()).select("img[src], tr td.j font[size]:nth-of-type(2)")) {
String elementTagName = el.tagName();
if (elementTagName.equalsIgnoreCase("img")) {
System.out.println("src attribute is : " + el.attr("src"));
} else if (elementTagName.equalsIgnoreCase("font")) {
System.out.println("description is : " + el.text());
} else {
System.out.println("Unexpected element >> " + el.html());
}
}
}
## ENTRY 1
src attribute is : //t0.gstatic.com/images?q=tbn:ANd9GcSLee4ulBtCEOMSuDuLHCAjDZwmlaVaXJVdC09133QbK3X1OpZH3s1RBplznEadxqV5memM0dh3
description is : With repeated disruptions stalling legislation including the GST or Goods and Services Tax, Prime Minister Narendra Modi today said it was a "matter of sorrow" that Parliament was not running. "It is not only GST, but many pro-poor steps are stuck in ...
## ENTRY 2
src attribute is : //t1.gstatic.com/images?q=tbn:ANd9GcQdJPtLOBi9F2Ktov11_x5kqHC4inID47xKD3we_ZC5rHP1Lps96sYHs_N0pBO9WkDj5KKuEa8
description is : Prime Minister Narendra Modi topped the charts of Facebook under the most-viewed
(...)
在JSoup 1.8.3上测试