如何在我的示例中解析Google搜索结果?
<div class="srg">
<li class="g">...</li>
<li class="g">...</li>
<li class="g">...</li>
<li class="g">...</li>
<li class="g">...</li>
<li class="g">...</li>
</div>
这是我解析Google搜索结果的代码,selectNodes仍然为空。
HtmlAgilityPack.HtmlDocument doc1 = new HtmlAgilityPack.HtmlDocument();
StreamReader reader = new StreamReader(WebRequest.Create("http://www.google.com/?gws_rd=ssl#q=(404)8271500").GetResponse().GetResponseStream(), Encoding.Default); //put your encoding
doc1.Load(reader);
var selectNodes = doc1.DocumentNode.SelectNodes("//li[@class='g']");
foreach (var node in selectNodes)
{
//node.InnerText will give you the text content of the li tags ...
}
答案 0 :(得分:1)
示例代码:
string result = @"<div class=""srg"">
<li class=""g"">...</li>
<li class=""g"">...</li>
<li class=""g"">...</li>
<li class=""g"">...</li>
<li class=""g"">...</li>
<li class=""g"">...</li>
</div>";
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(result);
var selectNodes = doc.DocumentNode.SelectNodes("//li[@class='g']");
foreach (var node in selectNodes)
{
//node.InnerText will give you the text content of the li tags ...
}
答案 1 :(得分:0)
为什么不使用API?
string query = "(404)8271500";
string json = "";
// Get the Json from the API. Dont forget to put your function in async.
// You need HttpClient https://www.nuget.org/packages/Microsoft.Net.Http
using (var client = new HttpClient())
{
json = await client.GetStringAsync("http://ajax.googleapis.com/ajax/services/search/web?v=1.0&rsz=large&start=0&q=" + query);
}
// Parse the Json string to your object.
// You need Json.NET https://www.nuget.org/packages/Newtonsoft.Json/
GoogleObject googleObject = JsonConvert.DeserializeObject<GoogleObject>(json);
foreach (var item in googleObject.responseData.results)
{
Console.WriteLine(item.title); // title
Console.WriteLine(item.content); // description
}
和您的GoogleObject
:
public class GoogleObject
{
public Responsedata responseData { get; set; }
public object responseDetails { get; set; }
public int responseStatus { get; set; }
}
public class Responsedata
{
public Result[] results { get; set; }
public Cursor cursor { get; set; }
}
public class Cursor
{
public string resultCount { get; set; }
public Page[] pages { get; set; }
public string estimatedResultCount { get; set; }
public int currentPageIndex { get; set; }
public string moreResultsUrl { get; set; }
public string searchResultTime { get; set; }
}
public class Page
{
public string start { get; set; }
public int label { get; set; }
}
public class Result
{
public string GsearchResultClass { get; set; }
public string unescapedUrl { get; set; }
public string url { get; set; }
public string visibleUrl { get; set; }
public string cacheUrl { get; set; }
public string title { get; set; }
public string titleNoFormatting { get; set; }
public string content { get; set; }
}
它无法解决您的问题,但可能符合您的需求。