如何提高此xmlreader代码效率

时间:2015-10-15 17:21:33

标签: php mysql xml

我有一个大的xml文件(~350mb)我需要存储到MySQL表中,作为存储在单独列中的正确数据等。

我曾尝试使用LOAD XML,但它从未成功 - 我一直没有内存,也没有输入(我正在托管我无法控制php ini,并且任何增加内存的尝试都允许我自己的php.ini或脚本本身(ini_set(' memory_limit', - 1)等)没有效果。

所以,我现在正在尝试使用XML阅读器来解析xml。我的xml包含大约130,000个条目,每个条目有18个孩子。

我的代码已经到目前为止 - 我可以插入大约75,000行,然后我的内存不足或者我的连接消失了#39。

我的问题是,如何让这段代码更节省内存?

代码:

<?php
mysqli_report(MYSQLI_REPORT_ERROR | MYSQLI_REPORT_STRICT);

print("<br>starting<br>");

// connect
$mysqli = mysqli_connect(CONNECTION STUFF);

// check connection
if (mysqli_connect_error()) {
    printf("Connect failed: %s\n", mysqli_connect_error());
    exit();
}

// open xml
$xml = new XMLReader;
$xml->open('../xml/jobs.xml');

$sql = $orig_sql = "REPLACE INTO jobs (jobref, date, title, company, email, url, salarymin, salarymax, benefits, salary, jobtype, full_part, salary_per, location, country, description, category, image) VALUES ";

$count = 0;
$chunks = 500;
$total = 0;

function escape_sql($unescaped) {
  $replacements = array(
     "\x00"=>'\x00',
     "\n"=>'\n',
     "\r"=>'\r',
     "\\"=>'\\\\',
     "'"=>"\'",
     '"'=>'\"',
     "\x1a"=>'\x1a'
  );
  return strtr($unescaped,$replacements);
}

// move to the first <job /> node
while ($xml->read() && $xml->name !== 'job');

// now that we're at the right depth, hop to the next <product/> until the end of the tree
while ($xml->name === 'job'){
    $node = simplexml_load_string($xml->readOuterXML());
    $title = escape_sql($node->{'title'});
    $desc = escape_sql($node->{'description'});
    $email = escape_sql($node->{'email'});
    $url = escape_sql($node->{'url'});
    $img = escape_sql($node->{'image'});
    $sal = escape_sql($node->{'salary'});
    $bens = escape_sql($node->{'benefits'});
    $comp = escape_sql($node->{'company'});
    $cat = escape_sql($node->{'category'});
    $loc = escape_sql($node->{'location'});

    // add to the sql query
    $sql .= "(".$node->{'jobref'}
            .",'"
            .$node->{'date'}
            ."','"
            .$title
            ."','"
            .$comp
            ."','"
            .$email
            ."','"
            .$url
            ."','"
            .$node->{'salarymin'}
            ."','"
            .$node->{'salarymax'}
            ."','"
            .$bens
            ."','"
            .$sal
            ."','"
            .$node->{'jobtype'}
            ."','"
            .$node->{'full_part'}
            ."','"
            .$node->{'salary_per'}
            ."','"
            .$loc
            ."','"
            .$node->{'country'}
            ."','"
            .$desc
            ."','"
            .$cat
            ."','"
            .$img
            ."')";

    $count ++;
    $total ++;

    if($count === $chunks){
        mysqli_query($mysqli, $sql) or die(mysqli_error());
        $count = 0;
        print('<br>inserted : '.$total.'<br>');
        $sql = $orig_sql;
    }else{
        $sql .= ',';
    }

    $xml->next('job');
}

$xml->close();
mysqli_close($mysqli);

print 'finished inserting data';

exit();

?>

0 个答案:

没有答案