我正在尝试通过HtmlAgilityPack解析以下html代码段:
<td bgcolor="silver" width="50%" valign="top">
<table bgcolor="silver" style="font-size: 90%" border="0" cellpadding="2" cellspacing="0"
width="100%">
<tr bgcolor="#003366">
<td>
<font color="white">Info
</td>
<td>
<font color="white">
<center>Price
</td>
<td align="right">
<font color="white">Hourly
</td>
</tr>
<tr>
<td>
<a href='test1.cgi?type=1'>Bookbags</a>
</td>
<td>
$156.42
</td>
<td align="right">
<font color="green">0.11%</font>
</td>
</tr>
<tr>
<td>
<a href='test2.cgi?type=2'>Jeans</a>
</td>
<td>
$235.92
</td>
<td align="right">
<font color="red">100%</font>
</td>
</tr>
</table>
</td>
我的代码看起来像这样:
private void ParseHtml(HtmlDocument htmlDoc)
{
var ItemsAndPrices = new Dictionary<string, int>();
var findItemPrices = from links in htmlDoc.DocumentNode.Descendants()
where links.Name.Equals("table") &&
links.Attributes["width"].Equals ("100%") &&
links.Attributes["bgcolor"].Equals("silver")
select new
{
//select item and price
}
在这种情况下,我想在下面找到elect the item which are Jeans and Bookbags
以及相关的prices
,并将它们存储在字典中。
E.g Jeans at price $235.92
有没有人知道如何通过htmlagility pack和LINQ正确地做到这一点?
答案 0 :(得分:0)
以下是我提出的建议:
var ItemsAndPrices = new Dictionary<string, string>();
var findItemPrices = from links in htmlDoc.DocumentNode.Descendants("tr").Skip(1)
select links;
foreach (var a in findItemPrices)
{
var values = (from tds in a.Descendants("td")
select tds.InnerText.Trim()).ToList();
ItemsAndPrices.Add(values[0], values[1]);
}
我唯一改变的是你的<string, int>
,因为$156.42
不是int
答案 1 :(得分:0)
试试这个: 正则表达式解决方案:
static Dictionary<string, string> GetProduct(string name, string html)
{
Dictionary<string, string> output = new Dictionary<string, string>();
string clfr = @"[\r\n]*[^\r\n]+";
string pattern = string.Format(@"href='([^']+)'>{0}</a>.*{1}{1}[\r\n]*([^\$][^\r\n]+)", name, clfr);
Match products = Regex.Match(html, pattern, RegexOptions.Multiline | RegexOptions.IgnorePatternWhitespace);
if(products.Success) {
GroupCollection details = products.Groups;
output.Add("Name", name);
output.Add("Link", details[1].Value);
output.Add("Price", details[2].Value.Trim());
return output;
}
return output;
}
然后:
var ProductNames = new string[2] { "Jeans", "Bookbags" };
for (int i = 0, len = ProductNames.Length; i < len; i++)
{
var product = GetProduct(ProductNames[i], html);
if (product.Count != 0)
{
Console.WriteLine("{0} at price {1}", product["Name"], product["Price"]);
}
}
输出:
Jeans at price $235.92
Bookbags at price $156.42
注意:
Dictionary
的值不能是int
,因为$235.92
/ $156.42
不是有效的int
。要将其转换为有效的int,您可以删除美元和点符号并使用
int.Parse()
答案 2 :(得分:0)
假设可能有其他行并且您并不特别想要书包和牛仔裤,我会这样做:
var table = htmlDoc.DocumentNode
.SelectSingleNode("//table[@bgcolor='silver' and @width='100%']");
var query =
from row in table.Elements("tr").Skip(1) // skip the header row
let columns = row.Elements("td").Take(2) // take only the first two columns
.Select(col => col.InnerText.Trim())
.ToList()
select new
{
Info = columns[0],
Price = Decimal.Parse(columns[1], NumberStyles.Currency),
};