如何处理大型帖子请求

时间:2015-06-18 03:02:49

标签: php json

不幸的是,我无法向您展示代码,但我可以让您了解它的外观,它的作用以及我遇到的问题......

<?php
   include(db.php);
   include(tools.php);
   $c = new GetDB(); // Connection to DB
   $t = new Tools(); // Classes to clean, prevent XSS and others
   if(isset($_POST['var'])){
       $nv = json_decode($_POST['var'])

       foreach($nv as $k) {
          $id = $t->clean($k->id);
          // ... goes on for about 10 keys
          // this might seems redundant or insufficient
          $id = $c->real_escape_string($id);
          // ... goes on for the rest of keys...

          $q = $c->query("SELECT * FROM table WHERE id = '$id'");
          $r = $q->fetch_row();
          if ($r[1] > 0) {
          // Item exist in DB then just UPDATE
             $q1 = $c->query(UPDATE TABLE1);
             $q4 = $c->query(UPDATE TABLE2);
             if ($x == 1) {
                 $q2 = $c->query(SELECT);
                 $rq = $q2->fetch_row();
                 if ($rq[0] > 0) {
                     // Item already in table just update
                     $q3 = $c->query(UPDATE TABLE3);
                 } else  {
                     // Item not in table then INSERT
                     $q3 = $c->query(INSERT TABLE3);
                 }
             }
          } else {
            // Item not in DB then Insert
             $q1 = $c->query(INSERT TABLE1);
             $q4 = $c->query(INSERT TABLE2);
             $q3 = $c->query(INSERT TABLE4);
             if($x == 1) {
                $q5 = $c->query(INSERT TABLE3);
             }
          }
       }
   }

正如你所看到的那样是一个非常基本的INSERT,UPDATE表脚本,所以在我们发布到完全生产之前,我们做了一些测试,看看脚本是否正常工作,以及&#34;结果&#34;哪里很好...
所以,我们针对100个请求运行此代码,一切正常......对于100个请求不到1.7秒......但随后我们看到需要发送/发布的数据量对我来说是下颚下降...超过20K的项目需要大约3到5分钟发送帖子,但脚本总是崩溃&#34;数据&#34;是json中的数组

array (
   [0] => array (
             [id] => 1,
             [val2] => 1,
             [val3] => 1,
             [val4] => 1,
             [val5] => 1,
             [val6] => 1,
             [val7] => 1,
             [val8] => 1,
             [val8] => 1,
             [val9] => 1,
             [val10] => 1
         ),
   [1] => array (
             [id] => 2,
             [val2] => 2,
             [val3] => 2,
             [val4] => 2,
             [val5] => 2,
             [val6] => 2,
             [val7] => 2,
             [val8] => 2,
             [val8] => 2,
             [val9] => 2,
             [val10] => 2
         ),
//... about 10 to 20K depend on the day and time
)

但在json ......任何方式,发送此信息都不是问题,就像我说它可能需要大约3到5分钟,问题是执行接收数据并执行查询的作业的代码...一个正常的共享主机我们得到一个503错误,通过调试它结果是一个超时,所以对于我们的VPS我们可以将max_execution_time增加到我们需要的任何东西,处理10K +我们的VPS大约需要1小时,但是在共享主机中我们不能使用max_execution_time ...所以我问其他开发人员发送信息的那个,而不是一次发送10K +发送一批1K并让它休息一秒然后发送另一批......依此类推......到目前为止,我还没有得到任何答案......所以我当时正想着做什么&#34;暂停&#34;在我结束时,比如说,在过程中1K项目等待一秒然后继续但我不认为它与批量接收数据一样有效...你会如何解决这个问题?

2 个答案:

答案 0 :(得分:1)

抱歉,我没有足够的声誉在任何地方发表评论,但是,我必须在答案中写下这个。我会推荐zedfoxus&#39;上面的批处理方法。此外,我强烈建议找出一种更快地处理这些查询的方法。请记住,每一个PHP函数调用等都会乘以每一行数据。以下是您可以获得更好性能的几种方法:

  1. 使用准备好的陈述。这将允许MySQL缓存每个连续查询的内存操作。这非常重要。
  2. 如果您使用预准备语句,则可以删除$c->real_escape_string()来电。我也会抓挠我的头,看看你可以放心使用$t->clean()方法。
  3. 接下来,我将评估单独评估每一行的性能。我必须对它进行基准测试才能确定,但​​我认为事先运行一些PHP语句比制作不必要的MySQL SELECT和UPDATE调用要快。一次插入多行时,MySQL要快得多。如果您希望输入的多行更改数据库中的同一行,那么您可能需要考虑以下内容:

    一个。考虑创建一个临时的预编译数组(取决于所涉及的内存使用情况),该数组存储唯一的数据行。我也会考虑为辅助TABLE3做同样的事情。这将消除不必要的更新&#34;查询,并使b部分成为可能。

    湾考虑单个查询,从数据库中选择数组中的每个id。这将是使用UPDATE查询的项目列表。更新每个行,然后将其从临时数组中删除。然后,您可以创建单个多行插入语句(当然是准备好的),它一次完成所有插入。

  4. 看看优化MySQL服务器参数以更好地处理负载。

  5. 我不知道这是否会加速准备好的INSERT语句,但它可能值得一试。您可以在事务中包装INSERT语句,详见答案:MySQL multiple insert performance
  6. 我希望有所帮助,如果其他人有任何建议,只需将其发布在评论中,我就会尝试将它们包括在内。

    在这里查看原始代码,只提出一些更改建议:

    <?php
       /* You can make sure that the connection type is persistent and 
        * I personally prefer using the PDO driver.
        */
       include(db.php);
       /* Definitely think twice about each tool that is included.
        * Only include what you need to evaluate the submitted data.
        */
       include(tools.php);
       $c = new GetDB(); // Connection to DB
       /* Take a look at optimizing the code in the Tools class. 
        * Avoid any and all kinds of loops–this code is going to be used in 
        * a loop and could easily turn into O(n^2) performance drain. 
        * Minimize the amount of string manipulation requests. 
        * Optimize regular expressions.
        */
       $t = new Tools(); // Classes to clean, prevent XSS and others
       if(isset($_POST['var'])){ // !empty() catches more cases than isset()
           $nv = json_decode($_POST['var'])
    
           /* LOOP LOGIC
            * Definitely test my hypothesis yourself, but this is similar 
            * to what I would try first.
            */
    
           //Row in database query
           $inTableSQL = "SELECT id FROM TABLE1 WHERE id IN("; //keep adding to it
    
           foreach ($nv as $k) {
               /* I would personally use specific methods per data type.
                * Here, I might use a type cast, plus valid int range check.
                */
               $id = $t->cleanId($k->id); //I would include a type cast: (int)
               // Similarly for other values
               //etc.
               // Then save validated data to the array(s)
               $data[$id] = array($values...);
               /* Now would also be a good time to add the id to the SELECT
                * statement
                */
               $inTableSQL .= "$id,";
           }
    
           $inTableSQL .= ");";
    
           // Execute query here
    
           // Then step through the query ids returned, perform UPDATEs,
           // remove the array element once UPDATE is done (use prepared statements)
    
           foreach (.....
    
           /* Then, insert the remaining rows all at once...
            * You'll have to step through the remaining array elements to
            * prepare the statement.
            */
           foreach(.....
    
       } //end initial POST data if
    
    
           /* Everything below here becomes irrelevant */
    
           foreach($nv as $k) {
              $id = $t->clean($k->id);
              // ... goes on for about 10 keys
              // this might seems redundant or insufficient
              $id = $c->real_escape_string($id);
              // ... goes on for the rest of keys...
    
              $q = $c->query("SELECT * FROM table WHERE id = '$id'");
              $r = $q->fetch_row();
              if ($r[1] > 0) {
              // Item exist in DB then just UPDATE
                 $q1 = $c->query(UPDATE TABLE1);
                 $q4 = $c->query(UPDATE TABLE2);
                 if ($x == 1) {
                     $q2 = $c->query(SELECT);
                     $rq = $q2->fetch_row();
                     if ($rq[0] > 0) {
                         // Item already in table just update
                         $q3 = $c->query(UPDATE TABLE3);
                     } else  {
                         // Item not in table then INSERT
                         $q3 = $c->query(INSERT TABLE3);
                     }
                 }
              } else {
                // Item not in DB then Insert
                 $q1 = $c->query(INSERT TABLE1);
                 $q4 = $c->query(INSERT TABLE2);
                 $q3 = $c->query(INSERT TABLE4);
                 if($x == 1) {
                    $q5 = $c->query(INSERT TABLE3);
                 }
              }
           }
       }
    

答案 1 :(得分:0)

关键是尽量减少查询。通常,在每次迭代执行一个或多个查询的数据循环中,您可以使用常量查询替换它。在您的情况下,您将要将其重写为以下内容:

include(db.php);
include(tools.php);
$c = new GetDB(); // Connection to DB
$t = new Tools(); // Classes to clean, prevent XSS and others
if(isset($_POST['var'])){
    $nv = json_decode($_POST['var'])

    $table1_data = array();
    $table2_data = array();
    $table3_data = array();
    $table4_data = array();

    foreach($nv as $k) {
        $id = $t->clean($k->id);
        // ... goes on for about 10 keys
        // this might seems redundant or insufficient
        $id = $c->real_escape_string($id);
        // ... goes on for the rest of keys...

        $table1_data[] = array( ... );
        $table2_data[] = array( ... );
        $table4_data[] = array( ... );
        if ($x == 1) {
            $table3_data[] = array( ... );
        }
    }

    $values = array_to_sql($table1_data);
    $c->query("INSERT INTO TABLE1 (...) VALUES $values ON DUPLICATE KEY UPDATE ...");
    $values = array_to_sql($table2_data);
    $c->query("INSERT INTO TABLE2 (...) VALUES $values ON DUPLICATE KEY UPDATE ...");
    $values = array_to_sql($table3_data);
    $c->query("INSERT INTO TABLE3 (...) VALUES $values ON DUPLICATE KEY UPDATE ...");
    $values = array_to_sql($table4_data);
    $c->query("INSERT IGNORE INTO TABLE4 (...) VALUES $values");
}

当您的原始代码在输入数据的每行的3到5个查询之间执行时,上述代码总共只执行4个查询。

我将array_to_sql的实现留给读者,但希望这可以解释这个想法。 TABLE4是一个INSERT IGNORE,因为你没有在&#34;找到&#34;中找到UPDATE。原始循环的子句。