Question

我是一名XML新手，我正在编写一个自然语言解析器，它以类似XML（但更像树）的格式输出数据。以下是根标记中的元素的外观：

<text>
    <paragraph>This is the first sentence. This is the second sentence.</paragraph>
      <sentence>This is the first sentence.</sentence>
        <word>This</word>
        <word>is</word>
        <word>the</word>
        <word>first</word>
        <word>sentence</word>
      <sentence>This is the secondsentence.</sentence>
        <word>This</word>
        <word>is</word>
        <word>the</word>
        <word>second</word>
        <word>sentence</word>
</text>

正如您所看到的，我基本上使用类似XML的标记来执行树视图以显示分解的每个步骤。 如何在保留这种树状结构的同时使其成为正确的XML？由于文本的解析方式，我希望避免做类似的事情：

<text><paragraph><sentence><word>This</word> <word>is</word> <word>the</word> <word>first</word> <word>sentence</word>.</sentence></paragraph></text>

我知道我可以制作混合内容：

<sentence>This is the first sentence.
  <word>This</word>
  <word>is</word>
  <word>the</word>
  <word>first</word>
  <word>sentence</word>
</sentence>

但是这个解决方案看起来并不优雅。我看到的另一个选择是将每个分解阶段都移动到一个孩子身上。或者有更优雅的方式吗？

Answer 1

正确的解决方案是您要避免的解决方案：

<text><paragraph><sentence><word>This</word> <word>is</word> <word>the</word> <word>first</word> <word>sentence</word>.</sentence></paragraph></text>

所以我们需要了解你为什么要避免它。

请注意，很可能输出不同的东西作为自然语言解析器的直接输出，然后使用一个或多个XSLT转换将其后处理为正确的格式。

说实话，我不确定你在问什么。您是否在寻求设计XML格式的帮助？因此不鼓励设计问题。其中一个原因是获得正确的设计涉及的信息比您给我们的信息要多得多（例如，谁在使用您正在制作的输出，以及他们使用的是什么）。

或者您在问如何生成您想要制作的格式？在这种情况下，我们需要更多地了解您的自然语言解析器及其所施加的约束。

树视图与嵌入式标签：XML中的数据表示

1 个答案: