从包含xpath

时间:2015-07-10 14:03:11

标签: html xpath html-agility-pack

我是Html Agility Pack和Xpath的新手。我的表格部分是:

                        ...
                        <tbody class="tdata">
                        <tr bgcolor="#ffffff">
                        <td nowrap>CRR004</td>
                        <td nowrap>Carrie</td>
                        <td nowrap>021</td>
                        <td nowrap>COAL</td>
                        <td nowrap>0.800</td>
                        <td nowrap>55.000</td>
                        <td nowrap>55.800</td>
                        <td nowrap>FCR A</td>
                        <td nowrap>&nbsp;</td>
                        <td nowrap>&nbsp;</td>
                        <td nowrap>&nbsp;</td>
                        </tr></tbody>

                        ...

                        <tbody class="tdata">
                        <tr bgcolor="#ffffff">
                        <td nowrap>CRR004</td>
                        <td nowrap>Carrie</td>
                        <td nowrap>021</td>
                        <td nowrap>COAL</td>
                        <td nowrap>0.800</td>
                        <td nowrap>99.500</td>
                        <td nowrap>100.300</td>
                        <td nowrap>FCL B</td>
                        <td nowrap>&nbsp;</td>
                        <td nowrap>&nbsp;</td>
                        <td nowrap>&nbsp;</td>
                        </tr></tbody>

                        <tbody class="tdata">
                        <tr bgcolor="#ffffff">
                        <td nowrap>CRR004</td>
                        <td nowrap>Carrie</td>
                        <td nowrap>034</td>
                        <td nowrap>BONE</td>
                        <td nowrap>0.100</td>
                        <td nowrap>100.300</td>
                        <td nowrap>100.400</td>
                        <td nowrap>FCL B</td>
                        <td nowrap>&nbsp;</td>
                        <td nowrap>&nbsp;</td>
                        <td nowrap>&nbsp;</td>
                        </tr></tbody>

                        <tbody class="tdata">
                        <tr bgcolor="#ffffff">
                        <td nowrap>CRR004</td>
                        <td nowrap>Carrie</td>
                        <td nowrap>021</td>
                        <td nowrap>COAL</td>
                        <td nowrap>0.400</td>
                        <td nowrap>100.400</td>
                        <td nowrap>100.800</td>
                        <td nowrap>FCL B</td>
                        <td nowrap>&nbsp;</td>
                        <td nowrap>&nbsp;</td>
                        <td nowrap>&nbsp;</td>
                        </tr></tbody>

使用Html Agility Pack,我可以使用以下内容获取整个表格。

doc.DocumentNode.SelectNodes("//tbody[@class='tdata']/tr")

但我只想从第8列包含“FCR A”的行中选择包含“FCL B”的最后一行。那是从第2行到第14行。

CRR004  Carrie  540 SS  1.100   53.900  55.000               
CRR004  Carrie  021 COAL    0.800   55.000  55.800  FCR A            
CRR004  Carrie  124 SH  4.200   55.800  60.000               
CRR004  Carrie  320 S SH    1.400   60.000  61.400               
CRR004  Carrie  540 SS  2.400   61.400  63.800               
CRR004  Carrie  320 S SH    0.300   63.800  64.100               
CRR004  Carrie  540 SS  15.900  64.100  80.000               
CRR004  Carrie  749 SS W/COAL STR   10.000  80.000  90.000               
CRR004  Carrie  540 SS  7.200   90.000  97.200               
CRR004  Carrie  124 SH  0.500   97.200  97.700               
CRR004  Carrie  114 BLACK SH    1.800   97.700  99.500               
CRR004  Carrie  021 COAL    0.800   99.500  100.300 FCL B            
CRR004  Carrie  034 BONE    0.100   100.300 100.400 FCL B            
CRR004  Carrie  021 COAL    0.400   100.400 100.800 FCL B            
CRR004  Carrie  120 CL SH   0.800   100.800 101.600      

我试过doc.DocumentNode.SelectNodes("//tbody[@class='tdata']/tr[following-sibling::td[8]='FCR A' and preceding-sibling::td[8]='FCL B']"))等无济于事。任何帮助将非常感谢。感谢。

1 个答案:

答案 0 :(得分:0)

在使用Splash的建议后,这有效:

//tr[contains(.,'FCR A')]|//tr[following::tr[td[8][.= 'FCL B']][last()] and preceding::tr[td[8][.= 'FCR A']][last()]]|//tr[contains(.,'FCL B')]

但也可以欣赏一个较短的表达方式。