我使用C#HTMLAgilityPack提取商品名称,价格&来自中文网站的货币符号:https://meadjohnson.world.tmall.com/search.htm?search=y&orderType=defaultSort&s cene = taobao_shop。这里是html的主要内容:
<div class="SaleItems">
<dl class="item ">
<dt class="photo"></dt>
<dd class="detail">
<a class="item-name">iPad</a>
<div class="price-area">
<span class="symbol">USD</span>
<span class="price">379</span>
</div>
</dd>
</dl>
<dl class="item ">
<dt class="photo"></dt>
<dd class="detail">
<a class="item-name">iPod</a>
<div class="price-area">
<span class="symbol">CAD</span>
<span class="price">139</span>
</div>
</dd>
</dl>
</div>
到目前为止,这是我的程序的样子。
ServicePointManager.SecurityProtocol = SecurityProtocolType.Tls
| SecurityProtocolType.Tls11
| SecurityProtocolType.Tls12
| SecurityProtocolType.Ssl3;
var htmlDocument = htmlWeb.Load(html);
var sItems = doc.DocumentNode.Descendants("SaleItems");
foreach (var item in sItems)
{
var data = new {
Currency = item["symbol"].InnerText,
Price = item["price"].InnerText,
};
}
这不起作用。我怎样才能解决我做错的事情?
答案 0 :(得分:1)
您可以通过这种方式提取数据:
var input = @"<div class='SaleItems'>
<dl class='item '>
<dt class='photo'></dt>
<dd class='detail'>
<a class='item-name'>iPad</a>
<div class='price-area'>
<span class='symbol'>USD</span>
<span class='price'>379</span>
</div>
</dd>
</dl>
<dl class='item '>
<dt class='photo'></dt>
<dd class='detail'>
<a class='item-name'>iPod</a>
<div class='price-area'>
<span class='symbol'>CAD</span>
<span class='price'>139</span>
</div>
</dd>
</dl>
</div>";
var html = new HtmlDocument();
html.LoadHtml(input);
var root = html.DocumentNode;
var list = new List<Data>();
foreach (var node in root.Descendants("dl"))
{
var currency = node.Descendants()
.Where(n => n.GetAttributeValue("class", "").Equals("symbol")).FirstOrDefault().InnerText;
var price = node.Descendants()
.Where(n => n.GetAttributeValue("class", "").Equals("price")).FirstOrDefault().InnerText;
list.Add(new Data { Currency = currency, Price = price});
}
public class Data
{
public string Currency { get; set; }
public string Price { get; set; }
}
或者您可以使用query expression
代替foreach
部分:
var list = (from node in root.Descendants("dl")
let currency = node.Descendants().Where(n => n.GetAttributeValue("class", "").Equals("symbol")).FirstOrDefault().InnerText
let price = node.Descendants().Where(n => n.GetAttributeValue("class", "").Equals("price")).FirstOrDefault().InnerText
select new Data {Currency = currency, Price = price}).ToList();
答案 1 :(得分:0)
确切的错误是在foreach()块&#34;项目&#34;是HtmlNode类型的变量,但您正在尝试索引&#34;它。而不是这个,你应该使用
item.Descendants("symbol")
或
item.SelectSingleNode(".//span[@class='symbol']");
或者你可以使用这段代码:
var document = new HtmlWeb();
var root = document.Load(url);
var data = new List<Item>();
foreach (var item in root.DocumentNode.SelectNodes("//dl"){
var name = item.SelectSingleNode(".//a[@class='item-name']").InnerText;
var price = item.SelectSingleNode(".//span[@class='price']").InnerText;
var symbol = item.SelectSingleNode(".//span[@class='symbol']").InnerText;
data.Add(new Item(){ Name = name, Price = price, Symbol = symbol });
}
public class Item{
public string Name;
public int Price;
public string Symbol;
}