Question

我试图通常编写一个XML解析器来使用未知模式的feed。基本上，我想最好地猜测＆＃39;行＆＃39;放在XML文档中。以下是两个示例Feed：

Feed 1，例如：

<xml>
  <some-container-tag>
    <some-row-tag>
      <attribute-1>value</attribute-1>
      <attribute-2>value</attribute-2>
      <attribute-3>value</attribute-3>
      <attribute-4>value</attribute-4>
    </some-row-tag>
    <some-row-tag>
      <attribute-1>value</attribute-1>
      <attribute-2>value</attribute-2>
      <attribute-3>value</attribute-3>
      <attribute-4>value</attribute-4>
    </some-row-tag>
    ...
  </some-container-tag>
</xml>

Feed 2，例如：

<xml>
  <some-container-tag>
    <some-row-tag>
      <attribute-1>value</attribute-1>
      <attribute-2>value</attribute-2>
      <attribute-3>value</attribute-3>
      <attribute-4>value</attribute-4>
      <optional-nested-attribute-set>
         ...
      </optional-nested-attribute-set>
    </some-row-tag>
    <some-row-tag>
      <attribute-1>value</attribute-1>
      <attribute-2>value</attribute-2>
      <attribute-3>value</attribute-3>
      <attribute-4>value</attribute-4>
      <optional-nested-attribute-set>
         ...
      </optional-nested-attribute-set>
    </some-row-tag>
    ...
  </some-container-tag>
  <some-other-container-tag>
    <some-row-tag>
      <attribute-1>value</attribute-1>
      <attribute-2>value</attribute-2>
      <attribute-3>value</attribute-3>
      <attribute-4>value</attribute-4>
      <optional-nested-attribute-set>
         ...
      </optional-nested-attribute-set>
    </some-row-tag>
  </some-other-container-tag>
</xml>

到目前为止，我所做的是遍历结构并将xpath映射到计数，因此对于例如第一个饲料就像：

xml => 1
xml/some-container-tag => 1
xml/some-container-tag/some-row-tag => n
xml/some-container-tag/some-row-tag/attribute-1 => n
xml/some-container-tag/some-row-tag/attribute-2 => n
xml/some-container-tag/some-row-tag/attribute-3 => n
xml/some-container-tag/some-row-tag/attribute-4 => n

现在我的想法是“基本单位”＃39; （行级别）将是最低级别的非叶子节点，尽管我遇到了问题（这里有独奏开发人员）审查这个想法。

当然饲料2很多＆＃39;更复杂的是，可能存在嵌套属性（基本上是子数组），并且可能有两个父列表。

这里有一个足够好的通用方法吗？

Answer 1

您的问题是您正在尝试将多维树结构转换为二维表格结构。没有架构，你没有一个好的方法来确保你的假设是正确的，但如果你必须这样做，你必须做出一些假设。

您可以通过层次结构中的深度来接近它，而不是在特定深度处的节点数量（没有什么可以说所有叶子节点将处于相同的深度，您现在遇到的问题）：

深度0（根标记）表示新的数据结构集合
深度1（some-container-tag）表示新的二维结构
深度2（some-row-tag）表示二维结构中的新行
深度3+表示进入该行的条目，该行本身可能有子条目。也许这些表示为CSV字符串，或者作为指向另一个数组/表格的指针，如数据结构 - 但如果你开始添加它，你就不再处理二维结构了。

所有这一切都取决于您最终需要对数据做什么，以及您选择处理它的语言中哪些假设是有效的。无论哪种方式，您可能会更好深度而不是 count 来解析它。此外，如果这确实是无模式的，您可能需要考虑如何处理XML中显示的属性。

在XML中自动检测/解析重复元素（＆＃39;行对象＆＃39;）

1 个答案: