CsQuery - 在两个选择器之间获取兄弟值?

时间:2013-08-02 21:34:13

标签: html linq csquery

我正在尝试解析大型HTML页面中的值,而我正在努力解决如何从两个选择器之间提取文本的问题。这是我的示例HTML来说明:

<table class="categories">
<tr class="category">
    <td class="categoryTitle">Category #1</td>
    <td class="categoryDate">12-1-2012</td>
    <td class="categoryFoos">212</td>       
</tr>
<tr class="catItem">
    <td class="catItemName"><div class="itemName">Category Item #1</div></td>
    <td class="catItemColor">Blue</td>
    <td class="catItemSprockets">17</td>
</tr>
<tr class="catItem">
    <td class="catItemName"><div class="itemName">Category Item #2</div></td>
    <td class="catItemColor">Red</td>
    <td class="catItemSprockets">454</td>
</tr>
<tr class="catItem">
    <td class="catItemName"><div class="itemName">Category Item #3</div></td>
    <td class="catItemColor">Purple</td>
    <td class="catItemSprockets">11</td>
</tr>
<tr class="category">
    <td class="categoryTitle">Category #2</td>
    <td class="categoryDate">12-17-2012</td>
    <td class="categoryFoos">311</td>       
</tr>
<tr class="catItem">
    <td class="catItemName"><div class="itemName">Category Item #1</div></td>
    <td class="catItemColor">Yellow</td>
    <td class="catItemSprockets">73</td>
</tr>
<tr class="catItem">
    <td class="catItemName"><div class="itemName">Category Item #2</div></td>
    <td class="catItemColor">Red</td>
    <td class="catItemSprockets">5</td>
</tr>
<tr class="catItem">
    <td class="catItemName"><div class="itemName">Category Item #3</div></td>
    <td class="catItemColor">Purple</td>
    <td class="catItemSprockets">11</td>
</tr>
</table>

我如何获取ICsqWebResponse并解析每个类别,标题,日期和'foos',以及每个类别中的所有项目作为项目集合?就这么清楚我想要最终得到的东西,对象应该是这样的:

Categories = {
    Category #1 { 
       Date: 12-1-2012,
       Foos: 212,
       Items: [
          Category Item #1 {
             Color: Blue,
             Sprockets: 17
          },
          Category Item #2 {
             Color: Red,
             Sprockets: 454
          },
          ... more items ...
       ]
     },
     Category #2 {
        Date: 12-17-2012,
        Sprockets: 311,
        Items: [
            Category Item #1 {
                Color: Yellow,
                Sprockets: 73
            },
            Category Item #2 {
                Color: Red,
                Sprockets: 5
            },
            Category Item #3 {
                Color: Purple,
                Sprockets: 11
            }
        ]
     }
 }

2 个答案:

答案 0 :(得分:0)

你会遍历所有行。使用CsQuery Lib

CQ dom = "<table> ...your html... </table>"; // or CQ.CreateFromUrl("http://www.jquery.com");
CQ rows= dom["tr"].ToList();

如果您有新类别,请启动新类别并添加项目。

var categoryList = new List<Category>();
var currentCategory = null;

    foreach(var r in rows) {
       // extract class name from html, with regex
       var className = ...;

       if(currentCategory != null && className == "catItem")
       {
           var item = new CategoryItem();
           item.Name = r[".itemName"].First().Text();
           item.Color = r[".catItemColor"].First().Text();
       ...

           currentCategory.Items.Add(item);
       }
       else if(className == "category")
       {
           var item = new CategoryItem();
           item.Date = r[".categoryDate"].First().Text();
           item.Foos= r[".categoryFoos"].First().Text();
       ...

           categoryList.Add(item);
       }

    }

免责声明:这不是生产就绪代码; - )

答案 1 :(得分:0)

如果我理解你想说的话......

    CQ html = "your html here";
    html[".Category"].Each((index,dom)=>{

        var category = dom.Cq(); //everything what will go bellow
        //you will need to use .Find() function NOT '[]' or SELECT because it will
        // get values from whole html not just from your  category

        string categoryTitle = category.Find(".categoryTitle").Text();
        string categoryDate = cateogry.Find(".categoryDate").Text();
        //and etc...

        //now loop throw catItems
        category[".catItems"].Each((catIndex,catDom)=>{

            var catItem = catDom.Cq();
            //the same principe goes here. 
        });
    });