XPath:选择所有后续节点,直到某个节点

时间:2010-11-30 11:07:38

标签: html xpath

我尝试将下面的结构解析成一组旅程选项,这样我就可以找到从Pontypridd到Llangollen并返回的所有可能方法。

使用XPath,我可以//div[@class='JourneyOptions']选择实际包含旅程信息的所有行。在XPath之外,我可以迭代每一行来决定是否应该将它添加到一组旅程中,或者它是否是新一组旅程中的第一个。

在下面的示例中,所有旅程集将包含两个旅程,但一组可能只包含一个旅程(“直接”旅程)或两个以上(多个“连接”)。

是否有XPath表达式可以选择第一个出站集的所有行程,第二个出站集的所有行程等等?

每组中的第一个旅程都有一个带整数值的无线电输入。我可以动态生成这些以获取每个集合,但是需要知道何时停止生成(或者只是等待XPath失败)。

<div class='TableHolder'>

  <p>...</p>
  <h2 id='DirectionHeader'>Outbound Options</h2>
  <p>Pontypridd to Llangollen, 30/11/1910</p>

  <!-- first part of the first journey from Pontypridd to Llangollen -->
  <div class='JourneyOptions'>
    <div class='Journey'>
      <div class='ColumnOne'>
          <input type='radio' checked='checked' name='out' value='1'>
      </div>
      ... some more divs of parseable journey info ...
    </div>
  <div>

  <!-- second part of the first journey from Pontypridd to Llangollen -->
  <div class='JourneyOptions'>
    <div class='Journey'>
      <div class='ConnectingJournies'>
          <p>...</p>
      </div>
      <div class='ColumnOne'>
          ... doesn't contain a radio input ...
      </div>
      ... some more divs of parseable journey info ...
    </div>
  </div>

  <!-- first part of the second journey from Pontypridd to Llangollen -->
  <div class='JourneyOptions'>
    <div class='Journey'>
      <div class='ColumnOne'>
          <input type='radio' name='out' value='2'>
      </div>
      ... some more divs of parseable journey info ...
    </div>
  <div>

  <!-- second part of the second journey from Pontypridd to Llangollen -->
  <div class='JourneyOptions'>
    <div class='Journey'>
      <div class='ConnectingJournies'>
          <p>...</p>
      </div>
      <div class='ColumnOne'>
          ... doesn't contain a radio input ...
      </div>
      ... some more divs of parseable journey info ...
    </div>
  </div>

  ... some more outbound journey options ...

  <p>...</p>
  <h2 id='DirectionHeader'>Inbound Options</h2>
  <p>Llangollen to Pontypridd, 07/11/1910</p>

  <!-- first part of the first journey from Llangollen to Pontypridd -->
  <div class='JourneyOptions'>
    <div class='Journey'>
      <div class='ColumnOne'>
          <input type='radio' checked='checked' name='in' value='1'>
      </div>
      ... some more divs of parseable journey info ...
    </div>
  <div>

  <!-- second part of the first journey from Llangollen to Pontypridd -->
  <div class='JourneyOptions'>
    <div class='Journey'>
      <div class='ConnectingJournies'>
          <p>...</p>
      </div>
      <div class='ColumnOne'>
          ... doesn't contain a radio input ...
      </div>
      ... some more divs of parseable journey info ...
    </div>
  </div>

  <!-- first part of the second journey from Llangollen to Pontypridd -->
  <div class='JourneyOptions'>
    <div class='Journey'>
      <div class='ColumnOne'>
          <input type='radio' name='in' value='2'>
      </div>
      ... some more divs of parseable journey info ...
    </div>
  <div>

  <!-- second part of the second journey from Llangollen to Pontypridd -->
  <div class='JourneyOptions'>
    <div class='Journey'>
      <div class='ConnectingJournies'>
          <p>...</p>
      </div>
      <div class='ColumnOne'>
          ... doesn't contain a radio input ...
      </div>
      ... some more divs of parseable journey info ...
    </div>
  </div>

  ... some more inbound journey options ...
</div>

很抱歉这个大型的例子,但我认为这个数字虽然我可以做到尽可能小,但仍能代表我的问题。

1 个答案:

答案 0 :(得分:0)

节点集只是......好吧,设置:订购的主机语言依赖关系(最多的文档顺序)唯一节点。如果您希望结果表达某种层次结构或分组,答案是您不能。

因此,您可以选择每个组的开头:

/div[@class='TableHolder']  
    /div[@class='JourneyOptions']
        [div[@class='Journey'] 
          /div[@class='ColumnOne'] 
              /input[@type='radio']
        ]

当时有一组(有很多选项):

/div[@class='TableHolder']
    /div[@class='JourneyOptions']
        [count(
            (self::div|preceding-sibling::div)
                [div[@class='Journey']
                    /div[@class='ColumnOne']
                       /input[@type='radio']
                ]
              ) = 1
        ]