XML解析难题

时间:2011-10-28 19:35:21

标签: php xml-parsing

更新:我已经重新设计了这个问题,以显示我所取得的进展,并且可能更容易回答。

更新2:我已经为XML添加了另一个值。每个拉链都有扩展。每个项目可以有一个由选项卡分隔的多个项目。所以它的结构会是这样的。平台>扩展(子组)>名称>标题。如果该项目有多个扩展名,则它会出现在多个位置。

我有以下XML文件。

<Item>
    <Platform>Windows</Platform>
    <Ext>gif    jpeg    doc</Ext>
    <Name>File Group 1</Name>
    <Title>This is the first file group</Title>
    <DownloadPath>/this/windows/1/1.zip</DownloadPath>
</Item>
<Item>
    <Platform>Windows</Platform>
    <Ext>gif    doc</Ext>
    <Name>File Group 1</Name>
    <Title>This is the first file group</Title>
    <DownloadPath>/this/windows/1/2.zip</DownloadPath>
</Item>
<Item>
    <Platform>Windows</Platform>
    <Ext>gif</Ext>
    <Name>File Group 1</Name>
    <Title>This is in the same group but has a different title</Title>
    <DownloadPath>/this/windows/1/3.zip</DownloadPath>
</Item>
<Item>
    <Platform>Mac</Platform>
    <Ext>gif    jpeg    doc</Ext>
    <Name>File Group 1</Name>
    <Title>This has the same group name but a different platform. Because it has the same title and name the files are added to this array below.</Title>
    <DownloadPath>/this/mac/1/1.zip</DownloadPath>
</Item>
<Item>
    <Platform>Mac</Platform>
    <Ext>jpeg   doc</Ext>
    <Name>File Group 1</Name>
    <Title>This has the same group name but a different platform. Because it has the same title and name the files are added to this array below.</Title>
    <DownloadPath>/this/mac/1/2.zip</DownloadPath>
</Item>
<Item>
    <Platform>Windows</Platform>
    <Ext>gif    jpeg    doc</Ext>
    <Name>File Group 2</Name>
    <Title>This is the second file group</Title>
    <DownloadPath>/this/windows/2/1.zip</DownloadPath>
</Item>
<Item>
    <Platform>Windows</Platform>
    <Ext>gif    jpeg    doc</Ext>
    <Name>File Group 2</Name>
    <Title>This is the second file group</Title>
    <DownloadPath>/this/windows/2/2.zip</DownloadPath>
</Item>
<Item>
    <Platform>Mac</Platform>
    <Ext>gif    jpeg    doc</Ext>
    <Name>File Group 3</Name>
    <Title>This is the second mac file group really.</Title>
    <DownloadPath>/this/windows/3/1.zip</DownloadPath>
</Item>

我希望能够通过它并对其进行排序,以便将其插入到规范化的表模式中。这是我希望数组构建的格式。

[Windows] => Array (
    [0] => array(
        "Name" => "File Group 1",
        "Title" => "This is the first file group",
        "Files" => array(
            [0] => array(
                "DownloadPath" => "/this/windows/1/1.zip"
            ),
            [1] => array(
                "DownloadPath" => "/this/windows/1/2.zip"
            )
        )
    ),
    [1] => array(
        "Name" => "File Group 1",
        "Title" => "This has the same name but has a different title, so it should be seperate.",
        "Files" => array(
            [0] => array(
                "DownloadPath" => "/this/windows/1/3.zip"
            )
        )
    ),
    [1] => array(
        "Name" => "File Group 2",
        "Title" => "This is the second file group",
        "Files" => array(
            [0] => array(
                "DownloadPath" => "/this/windows/2/1.zip"
            ),
            [1] => array(
                "DownloadPath" => "/this/windows/2/2.zip"
            )
        )
    )
),
[Mac] => Array(
    [0] => array(
        "Name" => "File Group 1",
        "Title" => "This has the same group name but a different platform. Because it has the same title and name the files are added to this array below.",
        "Files" => array(
            [0] => array(
                "DownloadPath" => "/this/mac/1/1.zip"
            ),
            [1] => array(
                "DownloadPath" => "/this/mac/1/2.zip"
            )
        )
    ),
    [1] => array(
        "Name" => "File Group 3",
        "Title" => "This is the second mac file group really.",
        "Files" => array(
            [0] => array(
                "DownloadPath" => "/this/mac/1/1.zip"
            ),
            [1] => array(
                "DownloadPath" => "/this/mac/1/2.zip"
            )
        )
    ),
)

这是我到目前为止用我的php

    $scrape_xml = "files.xml";
    $xml = simplexml_load_file($scrape_xml);

$groups = array();

foreach ($xml->Item as $file){

            if (!isset($groups[stripslashes($file->Platform)][stripslashes($file->Name)][stripslashes($file->Title)])){

                $groups[stripslashes($file->Platform)][stripslashes($file->Name)][stripslashes($file->Title)] = array(
                    'Platform' => $file->Platform,
                    'Name' => $file->Name,
                    'Title' => $file->Title
                );

            }

   $groups[stripslashes($file->Platform)][stripslashes($file->Name)][stripslashes($file->Title)]['Files'][] = $file->DownloadPath;

}

echo "count=" . $i;

echo "<pre>";
print_r($groups);
echo "</pre>";

它给了我这个结果

Array
(
    [Windows] => Array
        (
            [File Group 1] => Array
                (
                    [This is the first file group] => Array
                        (
                            [Platform] => SimpleXMLElement Object
                                (
                                    [0] => Windows
                                )

                            [Name] => SimpleXMLElement Object
                                (
                                    [0] => File Group 1
                                )

                            [Title] => SimpleXMLElement Object
                                (
                                    [0] => This is the first file group
                                )

                            [Files] => Array
                                (
                                    [0] => SimpleXMLElement Object
                                        (
                                            [0] => /this/windows/1/1.zip
                                        )

                                    [1] => SimpleXMLElement Object
                                        (
                                            [0] => /this/windows/1/2.zip
                                        )

                                )

                        )

                    [This is in the same group but has a different title] => Array
                        (
                            [Platform] => SimpleXMLElement Object
                                (
                                    [0] => Windows
                                )

                            [Name] => SimpleXMLElement Object
                                (
                                    [0] => File Group 1
                                )

                            [Title] => SimpleXMLElement Object
                                (
                                    [0] => This is in the same group but has a different title
                                )

                            [Files] => Array
                                (
                                    [0] => SimpleXMLElement Object
                                        (
                                            [0] => /this/windows/1/3.zip
                                        )

                                )

                        )

                )

            [File Group 2] => Array
                (
                    [This is the second file group] => Array
                        (
                            [Platform] => SimpleXMLElement Object
                                (
                                    [0] => Windows
                                )

                            [Name] => SimpleXMLElement Object
                                (
                                    [0] => File Group 2
                                )

                            [Title] => SimpleXMLElement Object
                                (
                                    [0] => This is the second file group
                                )

                            [Files] => Array
                                (
                                    [0] => SimpleXMLElement Object
                                        (
                                            [0] => /this/windows/2/1.zip
                                        )

                                    [1] => SimpleXMLElement Object
                                        (
                                            [0] => /this/windows/2/2.zip
                                        )

                                )

                        )

                )

        )

    [Mac] => Array
        (
            [File Group 1] => Array
                (
                    [This has the same group name but a different platform. Because it has the same title and name the files are added to this array below.] => Array
                        (
                            [Platform] => SimpleXMLElement Object
                                (
                                    [0] => Mac
                                )

                            [Name] => SimpleXMLElement Object
                                (
                                    [0] => File Group 1
                                )

                            [Title] => SimpleXMLElement Object
                                (
                                    [0] => This has the same group name but a different platform. Because it has the same title and name the files are added to this array below.
                                )

                            [Files] => Array
                                (
                                    [0] => SimpleXMLElement Object
                                        (
                                            [0] => /this/mac/1/1.zip
                                        )

                                    [1] => SimpleXMLElement Object
                                        (
                                            [0] => /this/mac/1/2.zip
                                        )

                                )

                        )

                )

            [File Group 3] => Array
                (
                    [This is the second mac file group really.] => Array
                        (
                            [Platform] => SimpleXMLElement Object
                                (
                                    [0] => Mac
                                )

                            [Name] => SimpleXMLElement Object
                                (
                                    [0] => File Group 3
                                )

                            [Title] => SimpleXMLElement Object
                                (
                                    [0] => This is the second mac file group really.
                                )

                            [Files] => Array
                                (
                                    [0] => SimpleXMLElement Object
                                        (
                                            [0] => /this/windows/3/1.zip
                                        )

                                )

                        )

                )

        )

)

更新2:新阵列结构

[Windows] => Array (
    [gif] =>Array(
        [0] => array(
            "Name" => "File Group 1",
            "Title" => "This is the first file group",
            "Files" => array(
                [0] => array(
                    "DownloadPath" => "/this/windows/1/1.zip"
                ),
                [1] => array(
                    "DownloadPath" => "/this/windows/1/2.zip"
                )
            )
        )
    ),
    [jpeg] => array(
        [0] => array(
            "Name" => "File Group 1",
            "Title" => "This is the first file group",
            "Files" => array(
                [0] => array(
                    "DownloadPath" => "/this/windows/1/1.zip"
                ),
                [1] => array(
                    "DownloadPath" => "/this/windows/1/2.zip"
                )
            )
        ),
        [1] => array(
            "Name" => "File Group 2",
            "Title" => "This is the second file group",
            "Files" => array(
                [0] => array(
                    "DownloadPath" => "/this/windows/2/1.zip"
                ),
                [1] => array(
                    "DownloadPath" => "/this/windows/2/2.zip"
                )
            )
        )
    ),
    [doc] => array(
        [0] => array(
            "Name" => "File Group 1",
            "Title" => "This is the first file group",
            "Files" => array(
                [0] => array(
                    "DownloadPath" => "/this/windows/1/1.zip"
                ),
                [1] => array(
                    "DownloadPath" => "/this/windows/1/2.zip"
                )
            )
        ),
        [1] => array(
            "Name" => "File Group 1",
            "Title" => "This has the same name but has a different title, so it should be seperate.",
            "Files" => array(
                [0] => array(
                    "DownloadPath" => "/this/windows/1/3.zip"
                )
            )
        ),
        [2] => array(
            "Name" => "File Group 2",
            "Title" => "This is the second file group",
            "Files" => array(
                [0] => array(
                    "DownloadPath" => "/this/windows/2/1.zip"
                ),
                [1] => array(
                    "DownloadPath" => "/this/windows/2/2.zip"
                )
            )
        )
    )
),
[Mac] => Array(
    [gif] => array(
        [0] => array(
            "Name" => "File Group 2",
            "Title" => "This is the second file group",
            "Files" => array(
                [0] => array(
                    "DownloadPath" => "/this/mac/2/1.zip"
                ),
                [1] => array(
                    "DownloadPath" => "/this/mac/2/2.zip"
                )
            )
        ),
        [1] => array(
            "Name" => "File Group 2",
            "Title" => "This is the second file group",
            "Files" => array(
                [0] => array(
                    "DownloadPath" => "/this/mac/2/1.zip"
                ),
                [1] => array(
                    "DownloadPath" => "/this/mac/2/2.zip"
                )
            )
        ),

    )
    [jepg] => array(
        [0] => array(
            "Name" => "File Group 2",
            "Title" => "This is the second file group",
            "Files" => array(
                [0] => array(
                    "DownloadPath" => "/this/mac/2/1.zip"
                ),
                [1] => array(
                    "DownloadPath" => "/this/mac/2/2.zip"
                )
            )
        )
    )
    [doc] => array(
        [0] => array(
            "Name" => "File Group 1",
            "Title" => "This has the same group name but a different platform. Because it has the same title and name the files are added to this array below.",
            "Files" => array(
                [0] => array(
                    "DownloadPath" => "/this/mac/1/1.zip"
                ),
                [1] => array(
                    "DownloadPath" => "/this/mac/1/2.zip"
                )
            )
        ),
        [1] => array(
            "Name" => "File Group 3",
            "Title" => "This is the second mac file group really.",
            "Files" => array(
                [0] => array(
                    "DownloadPath" => "/this/mac/1/1.zip"
                ),
                [1] => array(
                    "DownloadPath" => "/this/mac/1/2.zip"
                )
            )
        )
    )
)

更新3:文件列表中有一些垃圾邮件。

<Item>
        <Platform>Windows</Platform>
        <Ext>gif    jpeg    doc</Ext>
        <Name>File Group 1</Name>
        <Title>This is the first file group</Title>
        <DownloadPath>/this/windows/1/1.zip</DownloadPath>
    </Item>
    <Item>
        <Platform>Windows</Platform>
        <Ext>gif    jpeg    doc</Ext>
        <Name>File Group 1</Name>
        <Title>This is the first file group</Title>
        <DownloadPath>/this/windows/1/2.zip</DownloadPath>
    </Item>
<Item>
        <Platform>Windows</Platform>
        <Ext>gif    jpeg    doc</Ext>
        <Name>File Group 1</Name>
        <Title>This is the first file group</Title>
        <DownloadPath>/this/windows/2/1.zip</DownloadPath>
    </Item>
    <Item>
        <Platform>Windows</Platform>
        <Ext>gif    jpeg    doc</Ext>
        <Name>File Group 1</Name>
        <Title>This is the first file group</Title>
        <DownloadPath>/this/windows/2/2.zip</DownloadPath>
    </Item>

有一个项目具有相同的平台,扩展名,名称和标题。需要跳过上面的第3和第4项,并将它们保存到我稍后将要处理的数组中。

7 个答案:

答案 0 :(得分:1)

您只是通过不同的方式将输入值映射到输出数组中,这是您的结构:

Array(
  [... Item/Platform] => Array (
    [... Item/Title as 0-n] => array(
        "Name" => Item/Name,
        "Title" => Item/Title,
        "Files" => array(
            [...] => array(
                "DownloadPath" => Item/DownloadPath
            ),
        )
    ),

可以通过迭代XML中的项目并将值存储到新数组中的适当位置(我将其命名为$build)来完成映射:

$build = array();
foreach($items as $item)
{
    $platform = (string) $item->Platform;
    $title = (string) $item->Title;
    isset($build[$platform][$title]) ?: $build[$platform][$title] = array(
        'Name' => (string) $item->Name,
        'Title' => $title
    );
    $build[$platform][$title]['Files'][] = array('DownloadPath' => (string) $item->DownloadPath);
}
$build = array_map('array_values', $build);

array_map调用最后完成,将Item/Title键转换为数字键。

就是这样,Demo

如果有帮助,请告诉我。

编辑:对于您的更新数据,这是对上述内容的略微修改,前一个示例的关键原则仍然存在,另外还要考虑每个项目的每个额外扩展的额外重复,通过在内部添加另一个迭代:

$build = array();
foreach($items as $item)
{
    $platform = (string) $item->Platform;
    $title = (string) $item->Title;
    foreach(preg_split("~\s+~", $item->Ext) as $ext)
    {
        isset($build[$platform][$ext][$title])
            ?:$build[$platform][$ext][$title] = array(
                'Name' => (string) $item->Name,
                'Title' => $title
            );
        $build[$platform][$ext][$title]['Files'][]
            = array('DownloadPath' => (string) $item->DownloadPath);
    }
}
$build = array_map(function($v) {return array_map('array_values', $v);}, $build);

答案 1 :(得分:0)

首先声明

$groups[stripslashes($file->Platform)][stripslashes($file->Name)]
  [stripslashes($file->Title)] = (object)array(
    'Name' => $file->Name,
    'Title' => $file->Title,
    'Files' = (object)array()
  );

这会让你更接近。

您还应该检查每个XMLElement的类型,以查看它是一个数组还是一个简单的对象。然后相应地对待。

答案 2 :(得分:0)

你没有完全解释你所看到的错误,所以我不得不猜测。

首先,在您的源代码中,您的上一个DownloadPath是/this/windows/3/1.zip,即使它应该是一个Mac文件 - 错误类型,我敢肯定,但输出“看起来不对”。< / p>

接下来,如果你想要字符串而不是SimpleXMLElement对象,你需要这个(也做了一些整理以避免这么多stripslashes()次调用):

foreach ($xml->Item as $file) {
    $platform = stripslashes((string) $file->Platform);
    $name = stripslashes((string) $file->Name);
    $title = stripslashes((string) $file->Title);
    if( !isset($groups[$platform][$name][$title])) {
        $groups[$platform][$name][$title] = array(
            'Platform' => $platform,
            'Name' => $name,
            'Title' => $title 
        );
    } 
    $groups[$platform][$name][$title]['Files'][] = (string) $file->DownloadPath;
}

注意(string)位?他们将对象转换为字符串,允许您访问文字值而不是对象。这也是你的数组键工作的原因,因为它们内部强制转换为字符串(只有字符串和整数可以用作数组键)。

我认为我能找到的所有内容都可以回答你的问题。如果不是,请让我更清楚地知道什么是错的,我会很乐意尝试帮助。

答案 3 :(得分:0)

我更喜欢DOM DOcument和XPath,所以他就是我要做的......

$xml = '\path\to\your\file.xml';
$doc = new DOMDocument( '1.0', 'UTF-8' );
$doc->load( $xml );

$dxpath = new DOMXPath( $doc );
$items = $dxpath->query( '//Item' );

$db = new PDO( 'mysql:dbname=YOURDB:host=YOURHOST', $DBUSER, $DBPASS );
$ins = $db->prepare('
                    INSERT INTO ur_table
                    ( `platform` , `name` , `title` , `path` )
                    VALUES
                    ( :platform , :name , :title , :path );
                    ');

foreach( $items as $item )
{
    $ins->bindValue( ':platform'     , $item->getElementsByTagName( 'PlatForm' )->item(0)->nodeValue , PDO::PARAM_STR );
    $ins->bindValue( ':name'         , $item->getElementsByTagName( 'Name' )->item(0)->nodeValue     , PDO::PARAM_STR );
    $ins->bindValue( ':title'        , $item->getElementsByTagName( 'Title' )->item(0)->nodeValue    , PDO::PARAM_STR );
    $ins->bindValue( ':DownloadPath' , $item->getElementsByTagName( 'PlatForm' )->item(0)->nodeValue , PDO::PARAM_STR );
    $ins->execute();
}

不需要striplashes和什么不 - 它会为你处理所有的问题。

答案 4 :(得分:0)

这是怎么回事? 代码有点草率,可能应该进行调整以改进验证。

class XMLFileImporter {
  public $file; //Absolute path to import file
  public $import = array();
  public $xml;
  public $error = false;

  public function __construct($file) {
    $this->file = $file;
    $this->load();
  }

  public function load() {
    if(!is_readable($this->file)) {
      $this->error("File is not readable");
      return false;
    }

    $xml = simplexml_load_file($this->file);
    if(!$xml) {
      $this->error("XML could not be parsed");
      return false;
    }
    $this->xml = json_decode(json_encode($xml));

    return true;
  }

  public function import() {
    $count = $this->parseItems();
    echo "Imported $count rows";

  }

  public function parseItems() {
    if($this->error()){
      return false;
    }

    if(!self::validateXML($this->xml)) {
      $this->error("Invalid SimpleXML object");
      return false;
    }

    if(!self::validateArray($this->xml->Item)) {
      $this->error("Invalid Array 'Item' on SimpleXML object");
      return false;
    }
    $count = 0;
    foreach($this->xml->Item as $item) {
      if($this->parseItem($item)){
        $count++;
      }
    }
    return $count;

  }
  public function parseItem($item) {
    if($this->error()){
      return false;
    }

    if(!self::validateItem($item)) {
      $this->error("Invalid file item");
      return false;
    }

    $item = self::normalizeItem($item);

    $this->handlePlatform((string)$item->Platform);
    $this->handleGroup($item);
    $this->handleSubGroup($item);
    $this->handleFile($item);
    return true;
  }

  public function handlePlatform($platform) {
    if(!isset($this->import[$platform])) {
      $this->import[$platform] = array();
    }

    return true;
  }

  public function handleGroup($item) {
    if(!isset($this->import[$item->Platform][$item->Name])) {
      $this->import[$item->Platform][$item->Name] = array();
    }
    return true;
  }

  public function handleSubGroup($item) {
    if(!isset($this->import[$item->Platform][$item->Name][$item->Title])) {
      $this->import[$item->Platform][$item->Name][$item->Title] = array();
    }
    return true;
  }

  public function handleFile($item) {
    array_push($this->import[$item->Platform][$item->Name][$item->Title],$item->DownloadPath);
  }

  public function error($set=false) {
    if($set){
      $this->error = $set;
      return true;
    }
    return $this->error;
  }

  public static function validateXML($xml) {
    return is_object($xml);
  }
  public static function validateArray($arr,$min=1){
    return (isset($arr) && !empty($arr) && count($arr) > $min);

  }

  public static function validateItem($item){
    return (isset($item->Title)
           && isset($item->Name)
           && isset($item->DownloadPath)
           && isset($item->Platform));

  }

  public static function normalizeItem($item){
    $item->Name = stripslashes(trim((string)$item->Name));
    $item->Title = stripslashes(trim((string)$item->Title));
    $item->Platform = (string)$item->Platform;
    $item->DownloadPath = (string)$item->DownloadPath;

    return $item;
  }

  public function output() {
    print_r($this->import);
    return true;
  }

}

$importer = new XMLFileImporter(dirname(__FILE__)."/files.xml");
$importer->load();
$importer->import();
$importer->output();
var_dump($importer->error());

答案 5 :(得分:0)

你可以试试这个:

$scrape_xml = "files.xml";
$xml = simplexml_load_file($scrape_xml);

$group = array();

foreach ($xml->Item as $file)
{
    $platform = stripslashes($file->Platform);
    $name = stripslashes($file->Name);
    $title = stripslashes($file->Title);
    $downloadPath = stripslashes($file->DownloadPath);

    if(!isset($group[$platform]))
    {
        $group[$platform] = array();
        $group[$platform][] = array("Name" => $name,"Title" => $title, "Files" => array($downloadPath));
    }
    else
    {
        $found = false;

        for($i=0;$i<count($group[$platform]);$i++)
        {
            if($group[$platform][$i]["Name"] == $name  && $group[$platform][$i]["Title"] == $title)
            {
                $group[$platform][$i]["Files"][] = $downloadPath;
                $found = true;
                break;
            }
        }

        if(!$found)
        {
            $group[$platform][] = array("Name" => $name,"Title" => $title, "Files" => array($downloadPath));
        }
    }
}

echo "<pre>".print_r($group,true)."</pre>";

答案 6 :(得分:0)

这是为您提供所需结果的代码。 更新:这涉及您要求的最新分组。

$scrape_xml = "files.xml";
$xml = simplexml_load_file($scrape_xml);
$groups = array();

foreach ($xml->Item as $file){
    $platform = stripslashes($file->Platform);
    $name = stripslashes($file->Name);
    $title = stripslashes($file->Title);
    $extensions = explode('    ', $file->Ext);

    foreach($extensions as $extension)
    {
        if (!isset($groups2[$platform])) $groups2[$platform] = array();
        if (!isset($groups2[$platform][$extension])) $groups2[$platform][$extension] = array();

        $groupFound = false;
        for($idx = 0; $idx < count($groups2[$platform][$extension]); $idx ++) {
            if ($groups2[$platform][$extension][$idx]["Name"] == $name 
                && $groups2[$platform][$extension][$idx]["Title"] == $title) {

                $groups2[$platform][$extension][$idx]["Files"][] =
                    array('DownloadPath' => $file->DownloadPath."");

                $groupFound = true;

                break;
            }
        }

        if ($groupFound) continue;

        $groups2[$platform][$extension][] = 
            array(
                "Name" => $name,
                "Title" => $title,
                "Files" => array(array('DownloadPath' => $file->DownloadPath."")));
    }
}

echo "<br />";
echo "<pre>";
print_r($groups2);
echo "</pre>";