解析文本文件以获得一些包含

时间:2012-09-25 09:20:47

标签: php

我有一些文本文件。例如:file1.txtfile2.txt.

file1.txt的包含为Walk word1 in the rain Walking in the rain is one of the most beautiful word2 experiences.

有一些条件:

  1. 如果有word1word2,我希望将这两个字之间的文字设为$between,这样我就会得到in the rain Walking in the rain is one of the most beautiful。而且我想在word2$content获取文字,因此我会得到experiences
  2. 如果只有word1word2(例如= Walk in the rain Walking in the rain is one of the most beautiful word1 experiences.),那么$between =''$content就是所有文字 - > Walk in the rain Walking in the rain is one of the most beautiful word1 experiences.
  3. 如果word2前面有word1,例如:Walk in word2 the rain Walking in the rain is one of the most word1 beautiful word1 experiences.,那么$ between =''and $ content`就是所有文字。
  4. 这是我的代码:

    //to get and open the text files
    $txt = glob($savePath.'*.txt');
    foreach ($txt as $file => $files) {
        $handle = fopen($files, "r") or die ('can not open file');
        $ori_content = file_get_contents($files);
    
    //count the words of text, to reach until the last word
    $words = preg_split('/\s+/',$ori_content ,-1,PREG_SPLIT_NO_EMPTY);
    $count = count ($words);
    
    $word1 ='word1';
    $word2 ='word2';
        if (stripos($ori_content, $word1) && stripos($ori_content, $word2)){
            $between  = substr($ori_content, stripos($ori_content, $word1)+ strlen($word1), stripos($ori_content, $word2) - stripos($ori_content, $word1)- strlen($word1));
            $content  = substr($ori_content, stripos($ori_content, $word2)+strlen($word2), stripos($ori_content, $ori_content[$count+1])  - stripos($ori_content,$word2));
        }
        else 
        $content = $ori_content;
    
    $q0 = mysql_query("INSERT INTO tb VALUES('','$files','$content','$between')") or die(mysql_error());
    

    但我的代码仍然无法处理:

    1. 条件号2(上面),我得到的结果是$ =经验之间,它应该是$ between =''
    2. 条件号3(上述)。我得到了结果$ etween = the rain 在雨中漫步是最美丽的word1体验之一,它应该是$ between =''
    3. 如果我在file1.txt中获取$,而不是在file2.txt中,在数据库之间的表中,对于数据file2.txt,它之间的列应该为null。但它不为null,它由其他文本文件填充
    4. 我无法说到最后一句话。
    5. 请帮帮我..提前谢谢! :)

2 个答案:

答案 0 :(得分:1)

我认为你只是缺少一个陈述:

...
}
else {
    $between = '';
    $content = $ori_content;
}

您可能在循环中使用此功能,因此如果您未明确将$between设置为空字符串,则可获取上一循环的值:)

修改

您也忘了比较这些职位:

if (stripos($ori_content, $word1) && stripos($ori_content, $word2)){

应该是:

$pos1 = stripos($ori_content, $word1);
$pos2 = stripos($ori_content, $word2);
if (false !== $pos1 && false !== $pos2 && $pos1 < $pos2) {

修改2

另一件事;您的SQL易于注入,您无法以这种方式正确使用NULL值。您可以使用此类构造,但最好使用PDOmysqli

$sql_between = is_null($between) ? 'NULL' : "'" . mysql_real_escape_string($between) . "'";
// apply the same treatment for `$files`, etc.
...
mysql_query("INSERT INTO tb VALUES('', $sql_files, $sql_content, $sql_between)");

通过这种方式,您可以将$between设置为null并将其正确发送到MySQL。

答案 1 :(得分:1)

我已将解析器逻辑包装到函数parse_content中。

$txt = glob($savePath.'*.txt');
foreach ($txt as $file => $files) {
    $handle = fopen($files, "r") or die ('can not open file');
    $ori_content = file_get_contents($files);
    $word1 ='word1';
    $word2 ='word2';

    $result = parse_content($word1, $word2, $ori_content);
    extract($result);

    $q0 = mysql_query("INSERT INTO tb VALUES('','$files','$content','$between')") or die(mysql_error());

}


function parse_content($word1, $word2, $input) {
    $between = '';
    $content = '';

    $w1 = stripos($input, $word1);
    $w2 = stripos($input, $word2);

    if($w1 && $w2) {
        if($w2 < $w1) {
            // Case 3
            $content = $input;
        } else {
            // Case 1
            $reg_between = '/' . $word1 . '(.*?)' . $word2 . '/';
            $reg_content = '/' . $word2 . '(.*)$/';

            preg_match($reg_between, $input, $match);
            $between = trim($match[1]);
            preg_match($reg_content, $input, $match);
            $content = trim($match[1]);
        }
    } else if($w1 || $w2) {
        // Case 2
        $content = $input;
    } else {
        // Case 4
        $content = $input;
    }

    return compact('between', 'content');
}