在这种情况下,如何使用php查找xml字符串?

时间:2018-10-28 19:30:05

标签: php xml

我希望我的php脚本从基于xml id的特定链接下载文件。我希望它忽略其余的xml代码,我希望它仅查看每个库的第一个ID。

我的xml如下:

**

<lib id="ITEM_I_WANT_TO_DOWNLOAD_1" revision="0000">
    <part id="0000" type="ch"/>
    <part id="0000" type="ls"/>
    <part id="0000" type="rs"/>
    <part id="0000" type="ch"/>
  </lib>
<lib id="ITEM_I_WANT_TO_DOWNLOAD_2" revision="0000">
    <part id="0000" type="ch"/>
    <part id="0000" type="ls"/>
    <part id="0000" type="rs"/>
    <part id="0000" type="ch"/>
  </lib>

**

我当前的PHP脚本如下:

    if (!defined('STDIN'))
  {
      echo 'Please run it as a cmd ({path to your php}/php.exe {path to badges.php} -f)';
      exit;
  }
  define('BASE', 'https://randomtarget.com/');
  $figuremap = get_remote_data('https://random/xmlfile-needed.xml/');

  if (!file_exists('C:/outputfolder/')) {
    mkdir('C:/outputfolder/', 0777, true);
      echo "\n --------------> Output folder has been made... \n";

    sleep(3);

    $fp = fopen("C:/downloaded-xmlfile.xml", "w");
      fwrite($fp, $figuremap);
      fclose($fp);
    echo "\n --------------> XML downloaded and placed into folder \n";

    sleep(3);
  }
  $pos = 0;
  while ($pos = strpos($figuremap, '<lib id="', $pos +1))
  {
      $pos1 = strpos($figuremap, '"', $pos);
      $rule = substr($figuremap, $pos, ($pos1 -$pos));
      $rule = explode(',', $rule);
      $revision = str_replace('">', '', $rule[1]);
      $clothing_file = current(explode('*', str_replace('"', '', $rule[2])));
      if (file_exists('C:/outputfolder/'.$clothing_file.'.swf'))
      {
          echo 'Clothing_file found: '.$clothing_file."\r\n";
          continue;
      }
      echo 'Download clothing_file: '.$clothing_file.' '.$revision."\r\n";

      if (!@copy(BASE.'/'.$revision.'/'.$clothing_file.'.swf', 'C:/outputfolder'.$clothing_file.'.swf'))
      {
          echo 'Error downloading: '.$clothing_file."\r\n";
      }
  }

除了这段代码,我还编写了一个get_remote_data函数,这样就可以了。我只希望strpos抓住所有id =“''项目,以检查目标站点上是否存在文件。

我该如何解决?

2 个答案:

答案 0 :(得分:0)

有一些处理XML文件的简单方法,最简单(但灵活性较差)的是SimpleXML,以下代码应替换主处理循环...

$xml = simplexml_load_string($figuremap);

foreach ( $xml->lib as $lib )   {
      $clothing_file = (string) $lib['id'];

      if (file_exists('C:/outputfolder/'.$clothing_file.'.swf'))
      {
          echo 'Clothing_file found: '.$clothing_file."\r\n";
          continue;
      }
      echo 'Download clothing_file: '.$clothing_file.' '.$revision."\r\n";

      if (!@copy(BASE.'/'.$revision.'/'.$clothing_file.'.swf', 'C:/outputfolder'.$clothing_file.'.swf'))
      {
          echo 'Error downloading: '.$clothing_file."\r\n";
      }
}

起点是将$figuremap中的XML加载到SimpleXML中,然后遍历元素。这假定XML结构类似于...

<lib1>
    <lib id="ITEM_I_WANT_TO_DOWNLOAD_1" revision="0000">
        <part id="0000a" type="ch" />
        <part id="0000" type="ls" />
        <part id="0000" type="rs" />
        <part id="0000" type="ch" />
    </lib>
    <lib id="ITEM_I_WANT_TO_DOWNLOAD_2" revision="0000">
        <part id="00001" type="ch" />
        <part id="0000" type="ls" />
        <part id="0000" type="rs" />
        <part id="0000" type="ch" />
    </lib>
</lib1>

只要<lib>元素低1级,基本元素的实际名称就无关紧要,然后您可以使用$xml->lib对其进行循环。

答案 1 :(得分:0)

您发布的xml字符串实际上无效。它需要包装在要修复的父元素中。我不确定您是要发布确切的xml字符串还是其中的一部分。

$xml = '<lib id="ITEM_I_WANT_TO_DOWNLOAD_1" revision="0000">
    <part id="0000" type="ch"/>
    <part id="0000" type="ls"/>
    <part id="0000" type="rs"/>
    <part id="0000" type="ch"/>
  </lib>
<lib id="ITEM_I_WANT_TO_DOWNLOAD_2" revision="0000">
    <part id="0000" type="ch"/>
    <part id="0000" type="ls"/>
    <part id="0000" type="rs"/>
    <part id="0000" type="ch"/>
  </lib>';

$xml = '<mydocument>' . $xml . '</mydocument>';  // repair invalid xml
https://stackoverflow.com/q/4544272/2943403

$doc = new DOMDocument();
$doc->loadXml($xml);
$xpath = new DOMXpath($doc);
foreach ($xpath->evaluate('//lib/@id') as $attr) {
    $clothing_file = $attr->value;
    // perform your conditional actions ...
}

//lib/@id表示搜索文档中所有位置的所有id元素的<lib>属性。