如何减少文本文件大小?

时间:2015-07-10 02:07:06

标签: php mysql text-files

我有一个PHP代码,可以从我的数据库中获取产品ID(23436个唯一记录)。

我通过比较productID来获取每个产品ID并检查它是否已在feature_product表中设置。

如果在功能表中找不到该ID下的记录,那么通过将文本文件中的productID与feature_product表中不存在的productID进行比较,再次检查产品缺失功能的trial.txt文件。

问题是trial.txt文件中有593262行,并且需要永远匹配此文件中的productID。我的内存不足。我花了15个小时来实际获取文件中的所有数据,并且手动部分也是如此。有没有办法让它更快或没有耗尽时间和记忆?

我尝试按照网站上的一些帖子的建议增加php.ini文件中的最大执行时间。但它一直耗尽内存或最大执行时间。一旦我做对了,我将使用mysqli,因为不再使用mysql。我想把产品ID分开,这样我一次只能说5000个,但我不认为它会有助于执行时间。

<?php
$conn = mysql_connect("localhost", "dbuser", "pwd");  

 //loop through the 1st line to avoid the headers in csv
 if (!$conn){ 
 die('Could not connect : ' . mysql_error()); 
 echo mysql_error();
 }  
 echo '<p>Connected!';

 mysql_select_db("mydb") or die( "Unable to select database");

//Select all product ids from product table into product array
 $pArray = mysql_query("SELECT `id_product` from `product`",$conn);

 //loop through each product id
 while($row = mysql_fetch_assoc($pArray)) {

 //get product ID to check if it exists in features table
 $productID = $row["id_product"];

//check whether product id exists in feature table where product_id matches both product table and features table
 $fArray = mysql_query("SELECT * from `feature_product` WHERE `id_product`=$productID");

//if product Id does not have entry in feature table than call a function to get check if product id has features in text file
if(mysql_num_rows($fArray) ==0)
 {
 checkFeatures($productID);
 }
 else continue;
}

function checkFeatures($productID){
//trial.txt contains features of the products that are missing in features table but the products are in products table
$fd = fopen('trial.txt', 'r');
$fheader = fgets($fd); 

//creates a new text file to save all features(multiple records per product) separated by ',' for future use
$my_file = 'file.txt';
$handle = fopen($my_file, 'a') or die('Cannot open file:  '.$my_file);

while (($data = fgetcsv($fd,0, "~")) !== FALSE) {
//Since this text file has many products i only get the ones that are missing in the features table by comparing product ID which is the 1st element of data array
     if($data[0]==$productID){
     $d= $data[0].",".$data[1].",".$data[2].$data[3]."\n";
     echo $d."<BR/>";
     fwrite($handle, $d);
     }  
}
fclose($fd);
fclose($handle);

   }
?>  

产品表

的示例
id_product,shop,manufacutrer,category  
1000010,1,41,1112,1  
1000011,1,7,1721,1  
1000012,1,7,1721,1  

功能表

的示例
feature_id,id_product,value  
1,1000010,1  
3,1000010,2  
6,1000011,5  
11,1931555,1 

示例 trial.txt

IMSKU~AttributeID~Value~Unit~StoredValue~StoredUnit  
1000006~16121~2-25~~~  
1000006~3897~* McAfee Protection Suite~~~  
1000006~3933~* 1yr Subscription~~~  
1000010~1708~Feb 2011~~~  
1000010~1710~Cisco~~0.00~  
1000010~1711~http://www.cisco.com~~~  
1000011~2852~1~~0.00~  
1000011~2855~Light Cyan~~0.00~  
1000012~2840~May 2010~~~  
1000012~2842~HP~~0.00~  

我尝试按照用户的建议将文本文件加载为sql中的表

<?php  $con=mysqli_connect("localhost","username","pwd","db");
// Check connection
if (mysqli_connect_errno())
{
echo "Failed: " . mysqli_connect_error();
}

mysqli_query($con,"CREATE TABLE IF NOT EXISTS `add_features` (`id_product` INT(10) NOT NULL, `id_feature` INT(10) NOT NULL, `value` varchar(255),`unit` varchar(20),`s_value` varchar(20),`s_unit` varchar(20))");

$sql = "LOAD DATA INFILE 'trial.txt'
INTO TABLE `add_features`
FIELDS TERMINATED BY '~'
";
if ($con->query($sql) === TRUE) {
echo "OK!";
} else {
echo "Error: " . $sql . "<br>" . $con->error;
} 
$result = mysqli_query($con,"SELECT * FROM `add_features`");

echo "<table class='add_features'>
<tr class='titles'>
<th>Product_id</th>
<th>feature_id</th>
<th>value</th>
<th>Unit</th>
</tr>";

while($row = mysqli_fetch_array($result))
{
echo "<tr>";
echo "<td>" . $row['id_product'] . "</td>";
echo "<td>" . $row['id_feature'] . "</td>";
echo "<td>" . $row['value'] . "</td>";
echo "<td>" . $row['unit'] . "</td>";
echo "</tr>";
}
echo "</table>";

mysqli_close($con);
?>  

但是我收到了一个错误     错误:LOAD DATA INFILE&#39; trial.txt&#39; INTO TABLE add_features终止于&#39;〜&#39;

1 个答案:

答案 0 :(得分:1)

如果trial.txt文件是静态的,我会根据某个逻辑分隔符将其处理/解析为单独的较小文件,或者将其导入新的数据库表(最好),搜索它将是即时的。这是一次性导入,然后就完成了。

如果它不是静态的,它会多久改变一次?