Question

我正在为以下问题请求PHP解决方案：

我在数据库中有~15个表，每个表有10-50万行，总计达到2亿行，其中列为userID，B，C，D。

我有9个其他表，其中包含userID，fbID列。每个表有大约200万行。从userID到fbID的一对一映射。

我的目标是使用列fbID，B，C，D输出这2亿行的文件。

为了做到这一点，我必须搜索包含userID和fbID列的所有9个表，因为userID可以在一个表中找到，但不能在其他表中找到。我可以在任何一个表中找到userID后立即停止。这部分我正在使用SQL和PHP。 SQL查询包含LIMIT 1，因此每当找到userID时我只返回1行，因为这些表可以包含多个具有相同userID的行。

不幸的是，这个算法需要大约60s / 1k行，这需要大约130天才能完成。

有更有效的方法吗？

我不是数据库计算时间如何工作的专家，但我想到了一些想法：

- 遍历所有9个表，并使用userID键和fBID值创建一个查找表。

- 使用这9个表在数据库中创建一个新表，每个userID有一行，以及相应的FBID并搜索这个表。

以下是表格的更具体信息：

表总计最多2亿行（每个行看起来像这样）：

Column         Type        Null      Default 

dtLogTime      datetime    Yes       NULL 

iUin           int(10)     No         

B              int(10)     No

C              int(10)     No

D              int(10)     No

索引：

Keyname   Type  Unique Packed Column    Cardinality Collation Null Comment 

dtLogTime BTREE No     No     dtLogTime 323542      A         YES  

iUin      BTREE No     No     iUin      323542      A

其他9个表中的一个：

Column     Type        Null     Default     Comments 

dtLogTime  datetime    Yes      NULL   

iUin       int(10)     No         

vFBID      varchar(48) No

索引：

Keyname   Type  Unique Packed Column    Cardinality Collation Null Comment 

dtLogTime BTREE No     No     dtLogTime 2179789     A         YES  

iUin      BTREE No     No     iUin      2179789     A

示例代码我尝试过：

// returns FBID of iuin
function getFBID($iuin){

$query = sprintf("SELECT vFBID FROM `tbReg` WHERE iuin = " . $iuin . " LIMIT 1");
$result = mysql_query($query);
if(mysql_num_rows($result) != 0){
  $row = mysql_fetch_assoc($result);
  return $row['vFBID'];
}
mysql_free_result($result);

$query = sprintf("SELECT vFBID FROM `tbOnline` WHERE iuin = " . $iuin . " LIMIT 1");
$result = mysql_query($query);
if(mysql_num_rows($result) != 0){
  $row = mysql_fetch_assoc($result);
  return $row['vFBID'];
}
mysql_free_result($result);

$query = sprintf("SELECT vFBID FROM `tbConsumeFBC` WHERE iuin = " . $iuin . " LIMIT 1");
$result = mysql_query($query);
if(mysql_num_rows($result) != 0){
  $row = mysql_fetch_assoc($result);
  return $row['vFBID'];
}
mysql_free_result($result);

$query = sprintf("SELECT vFBID FROM `tbFeed` WHERE iuin = " . $iuin . " LIMIT 1");
$result = mysql_query($query);
if(mysql_num_rows($result) != 0){
   $row = mysql_fetch_assoc($result);
   return $row['vFBID'];
}
mysql_free_result($result);

$query = sprintf("SELECT vFBID FROM `tbInvite` WHERE iuin = " . $iuin . " LIMIT 1");
$result = mysql_query($query);
if(mysql_num_rows($result) != 0){
   $row = mysql_fetch_assoc($result);
   return $row['vFBID'];
}
mysql_free_result($result);  

$query = sprintf("SELECT vFBID FROM `tbFreeGift` WHERE iuin = " . $iuin . " LIMIT 1");
$result = mysql_query($query);
if(mysql_num_rows($result) != 0){
   $row = mysql_fetch_assoc($result);
   return $row['vFBID'];
}
mysql_free_result($result); 

$query = sprintf("SELECT vFBID FROM `tbUninstall` WHERE iuin = " . $iuin . " LIMIT 1");
$result = mysql_query($query);
if(mysql_num_rows($result) != 0){
   $row = mysql_fetch_assoc($result);
   return $row['vFBID'];
}
mysql_free_result($result);  

$query = sprintf("SELECT vFBID FROM `tbDownload` WHERE iuin = " . $iuin . " LIMIT 1");
$result = mysql_query($query);
if(mysql_num_rows($result) != 0){
   $row = mysql_fetch_assoc($result);
   return $row['vFBID'];
}

$query = sprintf("SELECT vFBID FROM `tbIUserSource` WHERE iuin = " . $iuin . " LIMIT 1");
$result = mysql_query($query);
if(mysql_num_rows($result) != 0){
   $row = mysql_fetch_assoc($result);
   return $row['vFBID'];
}
mysql_free_result($result);
}

fwrite($handle, '"Time","FBID","Action","ActionID"' . "\n");

$query = sprintf("SELECT count(dtLogTime) AS length
                  FROM `tbActionWeeding`");
$result = mysql_query($query);
$row = mysql_fetch_assoc($result);
$length = ceil($row['length'] * 0.0001);
$start = 0;
$i = 0;
while($i++ < 10000)
   $query = sprintf("SELECT dtLogTime, iuin, iWeedID
                     FROM `tbActionWeeding`
                     LIMIT " . $start . "," . $length);
   $result = mysql_query($query);
   if (!$result) {
      $message  = 'Invalid query: ' . mysql_error() . "\n";
      $message .= 'Whole query: ' . $query . "\n";
      die($message);
   }
   while($row = mysql_fetch_assoc($result))
      fwrite($handle, '"' . $row['dtLogTime'] . '","' . getFBID($row['iuin']) .
                   '","0","' . $row['iWeedID'] . "\"\n");
   mysql_free_result($result);
   $start += $length;
}

Answer 1

我有9个其他表，其中包含userID，fbID
列

和

这些其他9个表每个都有~2百万行

仅使用聪明的代码无法轻易克服这种数据结构的低效率。由于您需要处理大量冗余数据，因此最有效的算法将在此体系结构中运行缓慢。

您需要的是normalization。您应该更改表的结构以删除冗余数据。这将消除搜索9个独立表2亿次的需要，从而显着提高效率。

Answer 2

现在这可行了，虽然就像其他人在评论中说的那样，知道你是否有适当的索引会很好。

SELECT
  u.fbID, t.B, t.C, t.d
FROM
  veryLargeTable AS t
CROSS JOIN (
  SELECT userId, fbID FROM
    smallerTable1 
  UNION SELECT userId, fbID FROM
    smallerTable2 
  ...
  UNION SELECT userId, fbID FROM
    smallerTable9 
) AS u USING (userId)

您可能希望首先在较小的数据集上运行它以查看它的执行情况。

Answer 3

请理解，由于行数的原因，最有效的方法可能仍需要一些时间。

第一个真正的问题是你需要在PHP中使用它。这有多绝对？如果完全可以处理数据库本身，则需要执行此操作：

-- 
-- Index all 9 tables on userid,fbId

select UserId,fbId
  into WorkingTable_UserId_to_fbId
  from table1Of9
union all
select UserId,fbId
  from table_2_of_9
--
-- repeat the UNION all clause up to:
UNION ALL
select UserId,fbId
  from table_9_of_9
GROUP BY 1,2

-- Index resulting table on userId,fbId

这为您提供了一个工作表，可以进行基本查询：

select Linker.Fbid,main.b,main.c.,main.d
  from mainTable main
  JOIN WorkingTable_UserId_to_FbId linker on main.userId = linker.userId

如果绝对不可能创建该表，那么你必须使用相同的代码并将其插入到上面的查询中，它就不会那么快。它将是：

select Linker.Fbid,main.b,main.c.,main.d
  from mainTable main
  JOIN (  select UserId,fbId
           from table1Of9
          union all
         select UserId,fbId
           from table_2_of_9
         -- etc, etc.

       ) Linker on main.userId = linker.userId

然而，当服务器试图收集2亿行以准备返回PHP时，这可能会失效。所以你需要把它分成块，一次取出大概10000行。将OFFSET ... LIMIT添加到上面的查询可能很诱人，但这仍然会给服务器带来沉重的负担。最好在PHP中处理它，例如：

# Very sloppy code off the top of my head,
# modify this loop based on what you know of the
# userId values
$id = 1;
while($id <= 200000000) {
    $topId = $id + 9999;
    $sql="select Linker.Fbid,main.b,main.c.,main.d
            from mainTable main
            JOIN WorkingTable_UserId_to_FbId linker on main.userId = linker.userId
           WHERE main.userId between $id and $topId";

    # Note: don't freak out about SQL injection in the above code,
    #       you are hardcoding the values of ID, not getting them from a user

    #
    # Execute query, retrieve rows, output
    # then up the counter:
    $id+=1000;
}

需要有效的方法在2亿行数据库（PHP）上进行简单的计算

3 个答案: