PHP preg_replace在没有硬编码的情况下无法正常工作

时间:2017-09-14 14:08:10

标签: php regex function preg-replace

当我收到电子邮件时,我想将此电子邮件的一部分保存在数据库中,因此,我使用正则表达式删除了我不想要的部分。但是,这不起作用。

function emailToMessage($inbox, $emails){
$messages = [];

foreach ($emails as $email_number){

    # $overview contient les infos du mail
    $overview = imap_fetch_overview($inbox,$email_number,0);
    $headerInfos = imap_headerinfo($inbox,$email_number);
    $mailHeader = imap_fetchheader($inbox,$email_number);
    $structure = imap_fetchstructure($inbox,$email_number);
    $subject = (isset($overview[0]->subject)) ?$overview[0]->subject : 'Pas de sujet';
    $sender = $headerInfos->from[0]->mailbox . "@" . $headerInfos->from[0]->host;
    $body = imap_fetchbody($inbox,$email_number,1);

    $body = checkSubtype($structure, $body); // decode if the transfert encoding is base64 and change encoding to utf8
    $text = cleanMail($body, $mailHeader);

    #conversion de udate en format pouvant etre inséré dans mySql
    $mailDateAndTime = date("Y-m-d H:i:s", $overview[0]->udate);

    $message = [
        'title'     => $subject,
        'text'      => $text,
        'id_sender' => $sender,
        'mailDateAndTime' => $mailDateAndTime,
    ];

    $messages[] = $message;
    }
return $messages;
}

function cleanMail($text, $mailHeader){
    $clientMail = checkMailClient($mailHeader);

switch($clientMail){
    case "gmail":
    echo($text . "<br>");
    $gmail = preg_replace("/\s+[0-9]{4}-[0-9]{2}-[0-9]{2}\s+[0-9]{2}:[0-9]{2}\s+GMT\+[0-9]{2}:[0-9]{2}\s+(.)*\s+:(?:\s(.)*)*/i", '$1', $text);
    echo($gmail . "<br>");
    return $gmail;
    break;

如果

$body = "test

2017-09-11 11:55 GMT+02:00 XXX :

> frgthyjuki";

回波($文本);和echo($ gmail);两个都给我:

  

测试

     

2017-09-11 11:55 GMT + 02:00 XXX:

     

&GT; frgthyjuki

但是,如果我在emailToMessage()($ body = checkSubtype($ structure,$ body);和$ text = cleanMail($ body,$ mailHeader);)或cleanMail()之间声明$ text,它有效。

switch($clientMail){
    $text = "test

2017-09-11 11:55 GMT+02:00 XXX :

> frgthyjuki";

    case "gmail":
    echo($text . "<br>");
    $gmail = preg_replace("/\s+[0-9]{4}-[0-9]{2}-[0-9]{2}\s+[0-9]{2}:[0-9]{2}\s+GMT\+[0-9]{2}:[0-9]{2}\s+(.)*\s+:(?:\s(.)*)*/i", '$1', $text);
    echo($gmail . "<br>");
    return $gmail;
    break;

回波($文本);会给出

  

测试

     

2017-09-11 11:55 GMT + 02:00 XXX:

     

&GT; frgthyjuki

和echo($ gmail);会给:

  

测试

为什么只有当变量是硬编码时它才有效?我能做些什么来解决它?

更新

function checkSubtype($structure, $body){
    $subtype = strtolower($structure->subtype);

    if($subtype == "alternative") {
        $encoding = $structure->parts[0]->parameters[0]->value;
        $encoding = strtolower($encoding);
        $transferEncoding = $structure->parts[0]->encoding;

        $body = checkIfBase64($body, $transferEncoding);
        $body = quoted_printable_decode($body);
        $body = changeToUTF8($body, $encoding);
    } else {
        $transferEncoding = $structure->encoding;
        $body = checkIfBase64($body, $transferEncoding);
    }
    return $body;
}

-

function checkIfBase64($text, $transferEncoding){
    if($transferEncoding == "3"){
        $text = base64_decode($text);
    }
    return $text;
}

function changeToUTF8($text, $encoding){
    if($encoding != "utf-8" && $encoding != 'utf8'){
    $text = utf8_encode($text);
    }
    return $text;
}

更新2

我用Thunderbird和Orange测试过。

我不使用Thunderbird(我在网上查看我的RegEx,它是正确的)但它适用于Orange。

case "orange":
return preg_replace("/s*>(?:\s(?:.)*)*/i", '$1', $text);
break;

我真的不明白......

更新3

我重试没有可怕的解决方案,它适用于以前的代码。 这很奇怪......

1 个答案:

答案 0 :(得分:0)

这是一个可怕的解决方案,但它有效......

case "gmail":
    $step1 = preg_replace("/\s+/i", '#', $text);
    $step2 = preg_replace("/#*[0-9]{4}-[0-9]{2}-[0-9]{2}#+[0-9]{2}:[0-9]{2}#+GMT\+[0-9]{2}:[0-9]{2}#+(.)*#*:(?:#(.)*)*/i", '$2', $step1);
    return preg_replace("/#/i", ' ', $step2);
    break;