我的文字由一些句子组成。我必须解析每个句子中用点和计数词分隔的句子。包含5个以上单词的句子将被插入到数据库中。这是我的代码:
<?php
require_once 'conf/conf.php';// connect to database
function saveContent ($text) {
//I have to get every sentence without lose the dot
$text1 = str_replace('.', ".dot", $text);
$text2 = explode ('dot',$text1);
//Text that contain ' cannot be inserted to database, so i need to remove it
$text3 = str_replace("'", "", $text2);
//Selecting the sentence that only consist of more than words
for ($i=0;$i<count($text3);$i++){
if(count(explode(" ", $text3[$i]))>5){
$save = $text3[$i];
$q0 = mysql_query("INSERT INTO tbdocument VALUES('','$files','".$save."','','','') ");
}
}
}
$text= "I have some text files in my folder. I get them from extraction process of pdf journals files into txt files. here's my code";
$a = saveContent($text);
?>
结果只有一个句子(第一句)可以插入数据库中。 我需要你的帮助,非常感谢你:)。
答案 0 :(得分:0)
有很多方法可以改善这一点(并使其正常工作)。
不是将.
替换为.dot
,而是可以简单地在.
上展开,并记得稍后替换它。但是,如果你的句子类似 Mr。史密斯去了华盛顿。?你无法以可靠性来区分这些时期。
$files
中的变量INSERT
未在此函数的范围内定义。我们不知道它来自何处或者您希望它包含什么,但在这里,它将为NULL。
function saveContent ($text) {
// Just explode on the . and replace it later...
$sentences = explode(".", $text);
// Don't remove single quotes. They'll be properly escaped later...
// Rather than an incremental loop, use a proper foreach loop:
foreach ($sentences as $sentence) {
// Using preg_split() instead of explode() in case there are multiple spaces in sequence
if (count(preg_split('/\s+/', $sentence)) > 5) {
// Escape and insert
// And add the . back onto it
$save = mysql_real_escape_string($sentence) . ".";
// $files is not defined in scope of this function!
$q = mysql_query("INSERT INTO tbdocument VALUES('', '$files', '$sentence', '', '', '')");
// Don't forget to check for errors.
if (!$q) {
echo mysql_error();
}
}
}
}
从长远来看,请考虑远离mysql_*()
函数,并开始学习支持预处理语句(如PDO或MySQLi)的API。旧的mysql_*()
函数很快就会被弃用,缺乏预处理语句提供的安全性。