使用PHP从InDesign中解析生成的XML

时间:2014-02-26 21:16:13

标签: php xml parsing recursion adobe-indesign

我正在从InDesign生成XML,并希望用PHP解析XML。以下是InDesign生成的XML示例:

<?xml version="1.0" encoding="UTF-8"?>
<Root>
<page title="About Us">
  About Us
  <page>Overiew</page>
  <page>Where We Started</page>
  <page>Help</page>
</page>
<page>
  Automobiles
  <page>
     Cars
     <page>Small</page>
     <page>Medium</page>
     <page>Large</page>
  </page>
  <page>
     Trucks
     <page>Flatbet</page>
     <page>
        Pickup
        <page>Dodge</page>
        <page>Nissan</page>
     </page>
  </page>
</page>
</Root>

我正在使用以下PHP代码递归地解析XML。

header('Content-type: text/plain');

function parse_recursive(SimpleXMLElement $element, $level = 0)
{
        $indent     = str_repeat("\t", $level); // determine how much we'll indent

        $value      = trim((string) $element);  // get the value and trim any whitespace from the start and end
        $attributes = $element->attributes();   // get all attributes
        $children   = $element->children();     // get all children

        echo "{$indent}Parsing '{$element->getName()}'...".PHP_EOL;
        if(count($children) == 0 && !empty($value)) // only show value if there is any and if there aren't any children
        {
                echo "{$indent}Value: {$element}".PHP_EOL;
        }

        // only show attributes if there are any
        if(count($attributes) > 0)
        {
                echo $indent.'Has '.count($attributes).' attribute(s):'.PHP_EOL;
                foreach($attributes as $attribute)
                {
                        echo "{$indent}- {$attribute->getName()}: {$attribute}".PHP_EOL;
                }
        }

        // only show children if there are any
        if(count($children))
        {
                echo $indent.'Has '.count($children).' child(ren):'.PHP_EOL;
                foreach($children as $child)
                {
                        parse_recursive($child, $level+1); // recursion :)
                }
        }

        echo $indent.PHP_EOL; // just to make it "cleaner"
}

$xml = new SimpleXMLElement('data.xml', null, true);

parse_recursive($xml);

我遇到的问题是,当我解析XML时,除非完全被页面标记包围,否则我不会获取每个页面节点的文本值。因此,例如,除非查看title属性(如果存在),否则我无法阅读“关于我们”。这同样适用于“汽车”,“汽车”和“卡车”。

同样,这是从InDesign生成的XML。我可以要求设计人员向节点添加属性等,但我正在尝试最小化数据输入量。

我相信XML格式正确。任何帮助将不胜感激。

2 个答案:

答案 0 :(得分:1)

如果节点有任何子节点,则忽略所有文本值,以更改替换:

if(count($children) == 0 && !empty($value)) // only show value if there is any and if there aren't any children
{
  echo "{$indent}Value: {$element}".PHP_EOL;
}

if(!empty($value)) // only show value if there is anychildren
{
  echo "{$indent}Value: {$value}".PHP_EOL;
}

样本数据的结果是:

Parsing 'Root'...
Has 2 child(ren):
    Parsing 'page'...
    Value: About Us
    Has 1 attribute(s):
    - title: About Us
    Has 3 child(ren):
        Parsing 'page'...
        Value: Overiew

        Parsing 'page'...
        Value: Where We Started

        Parsing 'page'...
        Value: Help


    Parsing 'page'...
    Value: Automobiles
    Has 2 child(ren):
        Parsing 'page'...
        Value: Cars
        Has 3 child(ren):
            Parsing 'page'...
            Value: Small

            Parsing 'page'...
            Value: Medium

            Parsing 'page'...
            Value: Large


        Parsing 'page'...
        Value: Trucks
        Has 2 child(ren):
            Parsing 'page'...
            Value: Flatbet

            Parsing 'page'...
            Value: Pickup
            Has 2 child(ren):
                Parsing 'page'...
                Value: Dodge

                Parsing 'page'...
                Value: Nissan

答案 1 :(得分:0)

当然,我在努力解决这个问题,但一旦我提出问题,我就找到了答案。无论如何,这种方法起作用(最佳答案):

How to get a specific node text using php DOM

我想知道是否还有其他办法。