Question

我的脚本问题是从csv文件导入条目并将其作为wordpress自定义帖子插入（每行是一个帖子）... 最初，我在自己的类中设置了导入功能，而且几乎没有工作......从我收集的内容来看，问题是全局变量没有被缓存，每次调用实例时都会消耗更多的内存，直到进程耗尽内存并崩溃...所以我删除了类并设置了导入功能，如下面的代码所述。

通过这个设置，我达到了可以正常运行17k个帖子的点，但是如果我尝试导入更多的帖子，那么它就会丢失而没有任何错误（我的php错误中没有报告错误） log或wordpress debug.log文件）

脚本成功插入17k帖子打印出回显信息，直到它在“剩余XXX个项目”中过早停止，并且它完成加载页面输出的任何内容......它永远不会进入最终echo "Done!";陈述......

这在localhost开发环境和托管开发服务器上都会发生。我一直关注内存使用情况，在我的本地主机上它从未超过60％（从50％左右开始）我没有看到逐步内存爬升表明内存泄漏...

我也尝试过使用ini_set（'memory_limit'，'64M'）;和set_time_limit（0）;

从我读过的关于此的其他类似问题，

对于SQL 20k条目应该不是什么大不了的事
wordpress也应该能够处理这个问题足够强大

我可以对下面的代码进行哪些优化/改进，以使此脚本能够按此规模运行？

或者可能跳过wordpress内置的功能并使用LOACH DATA INFILE处理所有内容，如fancypants here所述

我更愿意通过提供的wordpress功能处理数据。

csv文件是〜1mb ...

代码：

这些函数驻留在他们自己的文件中 - import.php

function fileupload_process() {
  ini_set('memory_limit', '64M');
  set_time_limit(0);
  $uploadfiles = $_FILES['uploadfiles'];
  if (is_array($uploadfiles)) {
    foreach ($uploadfiles['name'] as $key => $value) {
      // look only for uploaded files
      if ($uploadfiles['error'][$key] == 0) {
        $filetmp = $uploadfiles['tmp_name'][$key];
        if (($handle = fopen($filetmp, "r")) !== FALSE) {
          $flag = true;
          $songs = explode("\n",file_get_contents($filetmp));
          $count = count( $songs );
          unset($songs);
          echo "Total item count: " . $count . "<BR />";
          // typical entry: If You Have To Ask,Red Hot Chili Peppers,0:03:37, Rock & Alternative,1991,on
          // using a generous 1000 length - will lowering this actually impact performance in terms of memory allocation?
          while (($data = fgetcsv($handle, 1000, ",")) !== FALSE) {
            // Skip the first entry in the csv containing colmn info
            if($flag) {
                      $flag = false; 
              echo "<BR />"; 
              $count--; 
              continue; 
            }
            // insert the current post and relevant info into the database
            $currently_processed = process_custom_post($data, $count);
            $count--;
          }
          echo "Done!";
          fclose($handle);
        }
        unlink($filetmp); // delete the temp csv file
      }
    }
  }
} // END: file_upload_process()
function process_custom_post($song, $count) {
  $track =  (array_key_exists(0, $song) && $song[0] != "" ?  $song[0] : 'N/A');
  $artist = (array_key_exists(1, $song) && $song[1] != ""  ?  $song[1] : 'N/A');
  $length = (array_key_exists(2, $song) && $song[2] != ""  ?  $song[2] : 'N/A');
  $genre = (array_key_exists(3, $song) && $song[3] != ""  ?  $song[3] : 'N/A');
  $year = (array_key_exists(4, $song) && $song[4] != ""  ?  $song[4] : 'N/A');
  $month = (array_key_exists(5, $song) && $song[5] != ""  ?  $song[5] : 'N/A');
  $playlist = (array_key_exists(6, $song) && $song[6] != ""  ?  $song[6] : '');
  $custom_post = array();
  $custom_post['post_type'] = 'songs';
  $custom_post['post_status'] = 'publish';
  $custom_post['post_title'] = $track;
  echo "Importing " . $artist  . " - " . $track . " <i> (" . $count ." items remaining)...</i><BR />";
  $post_id = wp_insert_post( $custom_post );
  $updated = update_post_meta($post_id, 'artist_name', $artist);
  $updated = update_post_meta($post_id, 'song_length', $length);
  $updated = update_post_meta($post_id, 'song_genre', $genre);
  $updated = update_post_meta($post_id, 'song_year', $year);
  $updated = update_post_meta($post_id, 'song_month', $month);
  $updated = update_post_meta($post_id, 'sample_playlist', $playlist);
  return true;
} // END: process_custom_post()
function import_page () {
//HTML for the import page + the file upload form
  if (isset($_POST['uploadfile'])) {
    fileupload_process();
       }
}

import.php包含在插件类

之外的wordpress插件中

即这是关于如何在导入页面上获取脚本的相关信息：

define( 'MY_PLUGIN_ROOT' , dirname(__FILE__) );
include_once( MY_PLUGIN_ROOT . 'import.php');
class my_plugin () {
 function __construct() {
  add_action( 'init', array( &$this, 'admin_menu_init' ) );
 }
 function admin_menu_init() {
   if(is_admin()) {
     //Add the necessary pages for the plugin
     add_action('admin_menu', array(&$this, 'add_menu_items'));
   }
 }
 function add_menu_items() {
  add_submenu_page( 'edit.php?post_type=songs', 'Import Songs', 'Import Songs',  'manage_options', 'import-songs', 'import_page' );
 }
}

任何想法，评论或建议都将不胜感激。

Answer 1

所以经过长时间的分析会议，与XDEBUG，phpmyadmin和朋友们一起，我终于缩小了瓶颈：mysql查询

我已启用

define('SAVEQUERIES', true);

更详细地检查查询，并将queries数组输出到我的debug.log文件

global $wpdb;
error_log( print_r( $wpdb->queries, true ) );

检查每次对数据库的INSERT调用

wp_insert_post（）

[0] =＆gt; INSERT INTO wp_posts（post_author，post_date，post_date_gmt，post_content，post_content_filtered，post_title，post_excerpt， post_status，post_type，comment_status，ping_status，post_password，post_name，to_ping，pinged，{{ 1}}，post_modified，post_modified_gmt，post_parent，menu_order）价值观（...）

[1] =＆gt; 0.10682702064514
update_post_meta（）

[0] =＆gt; INSERT INTO guid（wp_postmeta，post_id，meta_key）VALUES（...）

[1] =＆gt; 0.10227680206299

我发现在我的本地主机服务器上，每次调用数据库的成本平均为~0.1秒

但是，由于我每个条目更新6个自定义字段，因此

6 cf insert calls / entry * ~0.1s / cf insert call * 20 000 total entries * 1 min / 60s = 200 min

~2.5小时单独处理20k帖子的自定义字段

现在，由于所有自定义字段都插入到同一个表wp_post_meta中，因此我希望将所有update_post_meta调用组合到一个调用中，从而大大提高了此导入脚本执行时间的性能。 / p>

我查看了wp核心中的update_post_meta函数，看到没有传递数组而不是单个cf键和值的本机选项，所以我通过相关代码来查明SQL INSERT行并查看了如何做一些事情：

meta_value

等等，所有字段都在一个`INSERT INTO `wp_postmeta` (`post_id`,`meta_key`,`meta_value`) VALUES ( $post_id, 'artist_name' , $artist) ( $post_id, 'song_length' , $length ) ( $post_id, 'song_genre' , $genre ) ...`内滚动。

调整我的process_custom_post（）函数以反映这一点，我最终得到了：

$wpdb->query();

和中提琴！而不是每个自定义帖子对数据库进行7次INSERT调用，我最终会在尝试处理多个自定义字段时发挥巨大作用，同时迭代大量帖子 - 在我的情况下，现在，20k条目在15内完成处理完成-20分钟（相比之下我会在约17k的帖子和几小时的处理时间之后遇到辍学）......

所以我从整个经历中学到了关键的东西，

介意你的数据库调用 - 他们可以快速加起来！

从csv文件将大量自定义帖子导入wordpress

1 个答案: