我有一个包含两个表的数据库。 " speechesLCMcoded"包括400K行编码文本,以及"具体性"包含80k字的分数。
我编写了一个脚本,用解析后的文本查看表格(speechesLCMcoded),在删除标签(具体表)后检查另一个表中的每个单词,然后我将得到的分数加起来。
我是PHP的初学者,我的代码根本没有优化。我不介意我的脚本是否运行了一整天,但我不能让它运行一周。你会如何建议我优化我的剧本?
我的脚本执行我需要的一切。这太慢了。
<?php
//Include functions
include "functions.php";
ini_set('max_execution_time', 900000);
echo 'Time Limit = ' . ini_get('max_execution_time');
//Conecting the database
if (!$conn) {
die('Not connected : ' . mysql_error());}
// make LCM the current db
mysql_select_db('senate');
$data = mysql_query("SELECT `key`, `tagged` FROM speechesLCMcoded") or die(mysql_error());
// puts the "data" info into the $info array
while($info = mysql_fetch_array( $data) ){
$key=$info['key'];
$tagged=$info['tagged'];
unset($weight);
unset($count);
$weight=0;
$count=0;
// Print out the contents of the entry
Print "<b>Key:</b> ".$info['key'] . " <br>";
// Explodes the sentence
$speech = explode(" ", $tagged);
// Loop every word
foreach($speech as $word) {
//Print each word
//Print "<b>Key:</b> ".$word . " <br>";
//Check if string contains our tag
if(!preg_match('/({V}|{J}|{N}|{RB})/', $word, $matches)) {} else{
//Removes our tags
$word = str_replace("{V}", "", $word);
$word = str_replace("{RB}", "", $word);
$word = str_replace("{J}", "", $word);
$word = str_replace("{N}", "", $word);
$word = str_replace("{/V}", "", $word);
$word = str_replace("{/RB}", "", $word);
$word = str_replace("{/J}", "", $word);
$word = str_replace("{/N}", "", $word);
//print $word . " <br>";
//Check for the score
$checksql = "SELECT word, score FROM concreteness WHERE word = '$word'";
$query = mysql_query("$checksql");
$check_count = mysql_num_rows($query);
if($check_count > 0 ){
$data2 = mysql_fetch_assoc($query);
$weight=$weight+$data2['score'];
$count=$count +1;
// echo $weight;
// print "<br>";
// echo $count;
// print "<br>";
} else {
// echo"The word was NOT found.<br>";
} }
}
$sql = "UPDATE speechesLCMcoded SET weight='$weight', count='$count' WHERE `key`='$key';" ;
$retval = mysql_query( $sql, $conn );
if(! $retval )
{die('Could not update data: ' . mysql_error());}
echo "Updated data successfully\n";
}?>
答案 0 :(得分:1)
对于speechesLCMcoded(400K行)的每一行,你执行str_replace和sql查询。
您可以删除标签到第一个SQL查询使用替换功能(http://dev.mysql.com/doc/refman/5.0/en/replace.html)。每行不需要exec str_replace x 8。
这是第一步。
第二步,您只能使用一个查询和use join来获取两个表中的所有数据。