我正在尝试解析大型HTML页面中的值,而我正在努力解决如何从两个选择器之间提取文本的问题。这是我的示例HTML来说明:
<table class="categories">
<tr class="category">
<td class="categoryTitle">Category #1</td>
<td class="categoryDate">12-1-2012</td>
<td class="categoryFoos">212</td>
</tr>
<tr class="catItem">
<td class="catItemName"><div class="itemName">Category Item #1</div></td>
<td class="catItemColor">Blue</td>
<td class="catItemSprockets">17</td>
</tr>
<tr class="catItem">
<td class="catItemName"><div class="itemName">Category Item #2</div></td>
<td class="catItemColor">Red</td>
<td class="catItemSprockets">454</td>
</tr>
<tr class="catItem">
<td class="catItemName"><div class="itemName">Category Item #3</div></td>
<td class="catItemColor">Purple</td>
<td class="catItemSprockets">11</td>
</tr>
<tr class="category">
<td class="categoryTitle">Category #2</td>
<td class="categoryDate">12-17-2012</td>
<td class="categoryFoos">311</td>
</tr>
<tr class="catItem">
<td class="catItemName"><div class="itemName">Category Item #1</div></td>
<td class="catItemColor">Yellow</td>
<td class="catItemSprockets">73</td>
</tr>
<tr class="catItem">
<td class="catItemName"><div class="itemName">Category Item #2</div></td>
<td class="catItemColor">Red</td>
<td class="catItemSprockets">5</td>
</tr>
<tr class="catItem">
<td class="catItemName"><div class="itemName">Category Item #3</div></td>
<td class="catItemColor">Purple</td>
<td class="catItemSprockets">11</td>
</tr>
</table>
我如何获取ICsqWebResponse并解析每个类别,标题,日期和'foos',以及每个类别中的所有项目作为项目集合?就这么清楚我想要最终得到的东西,对象应该是这样的:
Categories = {
Category #1 {
Date: 12-1-2012,
Foos: 212,
Items: [
Category Item #1 {
Color: Blue,
Sprockets: 17
},
Category Item #2 {
Color: Red,
Sprockets: 454
},
... more items ...
]
},
Category #2 {
Date: 12-17-2012,
Sprockets: 311,
Items: [
Category Item #1 {
Color: Yellow,
Sprockets: 73
},
Category Item #2 {
Color: Red,
Sprockets: 5
},
Category Item #3 {
Color: Purple,
Sprockets: 11
}
]
}
}
答案 0 :(得分:0)
你会遍历所有行。使用CsQuery Lib。
CQ dom = "<table> ...your html... </table>"; // or CQ.CreateFromUrl("http://www.jquery.com");
CQ rows= dom["tr"].ToList();
如果您有新类别,请启动新类别并添加项目。
var categoryList = new List<Category>();
var currentCategory = null;
foreach(var r in rows) {
// extract class name from html, with regex
var className = ...;
if(currentCategory != null && className == "catItem")
{
var item = new CategoryItem();
item.Name = r[".itemName"].First().Text();
item.Color = r[".catItemColor"].First().Text();
...
currentCategory.Items.Add(item);
}
else if(className == "category")
{
var item = new CategoryItem();
item.Date = r[".categoryDate"].First().Text();
item.Foos= r[".categoryFoos"].First().Text();
...
categoryList.Add(item);
}
}
免责声明:这不是生产就绪代码; - )
答案 1 :(得分:0)
如果我理解你想说的话......
CQ html = "your html here";
html[".Category"].Each((index,dom)=>{
var category = dom.Cq(); //everything what will go bellow
//you will need to use .Find() function NOT '[]' or SELECT because it will
// get values from whole html not just from your category
string categoryTitle = category.Find(".categoryTitle").Text();
string categoryDate = cateogry.Find(".categoryDate").Text();
//and etc...
//now loop throw catItems
category[".catItems"].Each((catIndex,catDom)=>{
var catItem = catDom.Cq();
//the same principe goes here.
});
});