使用Gmail PHP API无法获取电子邮件正文

时间:2015-09-18 15:34:13

标签: php gmail-api

我在使用Gmail PHP API时遇到了问题。

我想检索电子邮件的正文内容,但我只能检索包含附件的电子邮件!我的问题是为什么?

到目前为止,这是我的代码:

// Authentication things above...
$client = getClient();
$gmail = new Google_Service_Gmail($client);    
$list = $gmail->users_messages->listUsersMessages('me', ['maxResults' => 1000]);

while ($list->getMessages() != null) {   
    foreach ($list->getMessages() as $mlist) {               
        $message_id = $mlist->id;   
        $optParamsGet2['format'] = 'full';
        $single_message = $gmail->users_messages->get('me', $message_id, $optParamsGet2);

        $threadId = $single_message->getThreadId();
        $payload = $single_message->getPayload();
        $headers = $payload->getHeaders();
        $parts = $payload->getParts();
        //print_r($parts); PRINTS SOMETHING ONLY IF I HAVE ATTACHMENTS...
        $body = $parts[0]['body'];
        $rawData = $body->data;
        $sanitizedData = strtr($rawData,'-_', '+/');
        $decodedMessage = base64_decode($sanitizedData); //should display my body content
    }

    if ($list->getNextPageToken() != null) {
        $pageToken = $list->getNextPageToken();
        $list = $gmail->users_messages->listUsersMessages('me', ['pageToken' => $pageToken, 'maxResults' => 1000]);
    } else {
        break;
    }
}

检索我知道的内容的第二个选项是使用位于标题部分的代码段,但它只检索50个左右的第一个字符,这不是非常有用。

6 个答案:

答案 0 :(得分:23)

我们做一个小实验吧。我给自己发了两条消息。一个有附件,一个没有。

请求:

GET https://www.googleapis.com/gmail/v1/users/me/messages?maxResults=2

<强>响应:

{
 "messages": [
  {
   "id": "14fe21fd6b3fb46f",
   "threadId": "14fe21fd6b3fb46f"
  },
  {
   "id": "14fe21f9341ed73c",
   "threadId": "14fe21f9341ed73c"
  }
 ],
 "nextPageToken": "08943597140129624594",
 "resultSizeEstimate": 3
}

我只要求有效载荷,因为那是所有相关部分的所在:

fields = payload

GET https://www.googleapis.com/gmail/v1/users/me/messages/14fe21fd6b3fb46f?fields=payload

GET https://www.googleapis.com/gmail/v1/users/me/messages/14fe21f9341ed73c?fields=payload

没有附件的邮件:

{
 "payload": {
  "parts": [
   {
    "partId": "0",
    "mimeType": "text/plain",
    "filename": "",
    "headers": [
     {
      "name": "Content-Type",
      "value": "text/plain; charset=UTF-8"
     }
    ],
    "body": {
     "size": 22,
     "data": "aGVjaz8gTm8gYXR0YWNobWVudD8NCg=="
    }
   },
   {
    "partId": "1",
    "mimeType": "text/html",
    "filename": "",
    "headers": [
     {
      "name": "Content-Type",
      "value": "text/html; charset=UTF-8"
     }
    ],
    "body": {
     "size": 43,
     "data": "PGRpdiBkaXI9Imx0ciI-aGVjaz8gTm8gYXR0YWNobWVudD88L2Rpdj4NCg=="
    }
   }
  ]
 }
}

带附件的邮件:

{
 "payload": {
  "parts": [
   {
    "mimeType": "multipart/alternative",
    "filename": "",
    "headers": [
     {
      "name": "Content-Type",
      "value": "multipart/alternative; boundary=001a1142e23c551e8e05200b4be0"
     }
    ],
    "body": {
     "size": 0
    },
    "parts": [
     {
      "partId": "0.0",
      "mimeType": "text/plain",
      "filename": "",
      "headers": [
       {
        "name": "Content-Type",
        "value": "text/plain; charset=UTF-8"
       }
      ],
      "body": {
       "size": 9,
       "data": "V293IG1hbg0K"
      }
     },
     {
      "partId": "0.1",
      "mimeType": "text/html",
      "filename": "",
      "headers": [
       {
        "name": "Content-Type",
        "value": "text/html; charset=UTF-8"
       }
      ],
      "body": {
       "size": 30,
       "data": "PGRpdiBkaXI9Imx0ciI-V293IG1hbjwvZGl2Pg0K"
      }
     }
    ]
   },
   {
    "partId": "1",
    "mimeType": "image/jpeg",
    "filename": "feelthebern.jpg",
    "headers": [
     {
      "name": "Content-Type",
      "value": "image/jpeg; name=\"feelthebern.jpg\""
     },
     {
      "name": "Content-Disposition",
      "value": "attachment; filename=\"feelthebern.jpg\""
     },
     {
      "name": "Content-Transfer-Encoding",
      "value": "base64"
     },
     {
      "name": "X-Attachment-Id",
      "value": "f_ieq3ev0i0"
     }
    ],
    "body": {
     "attachmentId": "ANGjdJ_2xG3WOiLh6MbUdYy4vo2VhV2kOso5AyuJW3333rbmk8BIE1GJHIOXkNIVGiphP3fGe7iuIl_MGzXBGNGvNslwlz8hOkvJZg2DaasVZsdVFT_5JGvJOLefgaSL4hqKJgtzOZG9K1XSMrRQAtz2V0NX7puPdXDU4gvalSuMRGwBhr_oDSfx2xljHEbGG6I4VLeLZfrzGGKW7BF-GO_FUxzJR8SizRYqIhgZNA6PfRGyOhf1s7bAPNW3M9KqWRgaK07WTOYl7DzW4hpNBPA4jrl7tgsssExHpfviFL7yL52lxsmbsiLe81Z5UoM",
     "size": 100446
    }
   }
  ]
 }
}

这些回复对应于代码中的$parts。正如你所看到的,如果你很幸运,$parts[0]['body']->data会给你你想要的东西,但大部分时间都没有。

这个问题通常有两种方法。你可以实现以下算法(你在PHP上比我好多了,但这是它的大致轮廓):

  1. 遍历payload.parts并检查其中是否包含part,其中包含您要查找的正文({1}}或text/plain)。如果有,您就完成了搜索。如果您正在解析上面没有附件的邮件,那就足够了。
  2. 再次执行第1步,但这一次使用text/html在您刚检查的parts内找到,递归。您最终会找到parts。如果您使用附件解析上述邮件,最终会找到您的part
  3. 该算法可能类似于以下内容(JavaScript中的示例):

    body

    更容易的选择是获取邮件的原始数据,并让已经编写的库为您完成工作:

    请求:

    var response = {
     "payload": {
      "parts": [
       {
        "mimeType": "multipart/alternative",
        "filename": "",
        "headers": [
         {
          "name": "Content-Type",
          "value": "multipart/alternative; boundary=001a1142e23c551e8e05200b4be0"
         }
        ],
        "body": {
         "size": 0
        },
        "parts": [
         {
          "partId": "0.0",
          "mimeType": "text/plain",
          "filename": "",
          "headers": [
           {
            "name": "Content-Type",
            "value": "text/plain; charset=UTF-8"
           }
          ],
          "body": {
           "size": 9,
           "data": "V293IG1hbg0K"
          }
         },
         {
          "partId": "0.1",
          "mimeType": "text/html",
          "filename": "",
          "headers": [
           {
            "name": "Content-Type",
            "value": "text/html; charset=UTF-8"
           }
          ],
          "body": {
           "size": 30,
           "data": "PGRpdiBkaXI9Imx0ciI-V293IG1hbjwvZGl2Pg0K"
          }
         }
        ]
       },
       {
        "partId": "1",
        "mimeType": "image/jpeg",
        "filename": "feelthebern.jpg",
        "headers": [
         {
          "name": "Content-Type",
          "value": "image/jpeg; name=\"feelthebern.jpg\""
         },
         {
          "name": "Content-Disposition",
          "value": "attachment; filename=\"feelthebern.jpg\""
         },
         {
          "name": "Content-Transfer-Encoding",
          "value": "base64"
         },
         {
          "name": "X-Attachment-Id",
          "value": "f_ieq3ev0i0"
         }
        ],
        "body": {
         "attachmentId": "ANGjdJ_2xG3WOiLh6MbUdYy4vo2VhV2kOso5AyuJW3333rbmk8BIE1GJHIOXkNIVGiphP3fGe7iuIl_MGzXBGNGvNslwlz8hOkvJZg2DaasVZsdVFT_5JGvJOLefgaSL4hqKJgtzOZG9K1XSMrRQAtz2V0NX7puPdXDU4gvalSuMRGwBhr_oDSfx2xljHEbGG6I4VLeLZfrzGGKW7BF-GO_FUxzJR8SizRYqIhgZNA6PfRGyOhf1s7bAPNW3M9KqWRgaK07WTOYl7DzW4hpNBPA4jrl7tgsssExHpfviFL7yL52lxsmbsiLe81Z5UoM",
         "size": 100446
        }
       }
      ]
     }
    };
    
    // In e.g. a plain text message, the payload is the only part.
    var parts = [response.payload];
    
    while (parts.length) {
      var part = parts.shift();
      if (part.parts) {
        parts = parts.concat(part.parts);
      }
    
      if(part.mimeType === 'text/html') {
        var decodedPart = decodeURIComponent(escape(atob(part.body.data.replace(/\-/g, '+').replace(/\_/g, '/'))));
        console.log(decodedPart);
      }
    }

    <强>响应:

    format = raw
    fields = raw
    
    GET https://www.googleapis.com/gmail/v1/users/me/messages/14fe21fd6b3fb46f?format=raw&fields=raw
    

    第二种方法的最大缺点是,如果您获得原始消息,您将立即下载所有附件数据,这可能与您的用例数据相差甚远。

    我不擅长PHP,但this看起来很有希望,如果你想要第二个解决方案!祝你好运!

答案 1 :(得分:22)

更新:您可能需要在此下方检查我的第二个答案,以获得更完整的代码。

最后,我今天工作了,所以这里是找到身体的完整代码答案 - 感谢 @Tholle

// Authentication things above
/*
 * Decode the body.
 * @param : encoded body  - or null
 * @return : the body if found, else FALSE;
 */
function decodeBody($body) {
    $rawData = $body;
    $sanitizedData = strtr($rawData,'-_', '+/');
    $decodedMessage = base64_decode($sanitizedData);
    if(!$decodedMessage){
        $decodedMessage = FALSE;
    }
    return $decodedMessage;
}

$client = getClient();
$gmail = new Google_Service_Gmail($client);

$list = $gmail->users_messages->listUsersMessages('me', ['maxResults' => 1000]);

try{
    while ($list->getMessages() != null) {

        foreach ($list->getMessages() as $mlist) {

            $message_id = $mlist->id;
            $optParamsGet2['format'] = 'full';
            $single_message = $gmail->users_messages->get('me', $message_id, $optParamsGet2);
            $payload = $single_message->getPayload();

            // With no attachment, the payload might be directly in the body, encoded.
            $body = $payload->getBody();
            $FOUND_BODY = decodeBody($body['data']);

            // If we didn't find a body, let's look for the parts
            if(!$FOUND_BODY) {
                $parts = $payload->getParts();
                foreach ($parts  as $part) {
                    if($part['body']) {
                        $FOUND_BODY = decodeBody($part['body']->data);
                        break;
                    }
                    // Last try: if we didn't find the body in the first parts, 
                    // let's loop into the parts of the parts (as @Tholle suggested).
                    if($part['parts'] && !$FOUND_BODY) {
                        foreach ($part['parts'] as $p) {
                            // replace 'text/html' by 'text/plain' if you prefer
                            if($p['mimeType'] === 'text/html' && $p['body']) {
                                $FOUND_BODY = decodeBody($p['body']->data);
                                break;
                            }
                        }
                    }
                    if($FOUND_BODY) {
                        break;
                    }
                }
            }
            // Finally, print the message ID and the body
            print_r($message_id . " : " . $FOUND_BODY);
        }

        if ($list->getNextPageToken() != null) {
            $pageToken = $list->getNextPageToken();
            $list = $gmail->users_messages->listUsersMessages('me', ['pageToken' => $pageToken, 'maxResults' => 1000]);
        } else {
            break;
        }
    }
} catch (Exception $e) {
    echo $e->getMessage();
}

正如您所看到的,我的问题是,有时在有效负载 - &gt;部分中找不到正文,而是直接在有效负载 - &gt;正文中找到! (加上我为多个部分添加循环)。

希望这有助于其他人。

答案 2 :(得分:18)

对于那些感兴趣的人,我大大改进了我的最后一个答案,使其使用text / html(并在必要时回退到text / plain)并将图像转换为base64附件,当打印为完整HTML时将自动加载!

代码根本不完美,而且太长而无法详细解释,但它对我有用。

随意接受并适应它(如果有必要,可以正确/改进)。

x = 640 pixels in width, 
2x = 750 pixels in width, 
3x = 1125 pixels in width

干杯。

答案 3 :(得分:10)

我编写此代码是为了改进@ F3L1X79的答案,因为这会正确过滤html响应。

<?php
ini_set("display_errors", 1);
ini_set("track_errors", 1);
ini_set("html_errors", 1);
error_reporting(E_ALL);
require_once __DIR__ . '/vendor/autoload.php';

session_start();

function decodeBody($body) {
    $rawData = $body;
    $sanitizedData = strtr($rawData,'-_', '+/');
    $decodedMessage = base64_decode($sanitizedData);
    if(!$decodedMessage){
        $decodedMessage = FALSE;
    }
    return $decodedMessage;
}

function fetchMails($gmail, $q) {

try{
    $list = $gmail->users_messages->listUsersMessages('me', array('q' => $q));
    while ($list->getMessages() != null) {

        foreach ($list->getMessages() as $mlist) {

            $message_id = $mlist->id;
            $optParamsGet2['format'] = 'full';
            $single_message = $gmail->users_messages->get('me', $message_id, $optParamsGet2);
            $payload = $single_message->getPayload();

            // With no attachment, the payload might be directly in the body, encoded.
            $body = $payload->getBody();
            $FOUND_BODY = decodeBody($body['data']);

            // If we didn't find a body, let's look for the parts
            if(!$FOUND_BODY) {
                $parts = $payload->getParts();
                foreach ($parts  as $part) {
                    if($part['body'] && $part['mimeType'] == 'text/html') {
                        $FOUND_BODY = decodeBody($part['body']->data);
                        break;
                    }
                }
            } if(!$FOUND_BODY) {
                foreach ($parts  as $part) {
                    // Last try: if we didn't find the body in the first parts, 
                    // let's loop into the parts of the parts (as @Tholle suggested).
                    if($part['parts'] && !$FOUND_BODY) {
                        foreach ($part['parts'] as $p) {
                            // replace 'text/html' by 'text/plain' if you prefer
                            if($p['mimeType'] === 'text/html' && $p['body']) {
                                $FOUND_BODY = decodeBody($p['body']->data);
                                break;
                            }
                        }
                    }
                    if($FOUND_BODY) {
                        break;
                    }
                }
            }
            // Finally, print the message ID and the body
            print_r($message_id . " <br> <br> <br> *-*-*- " . $FOUND_BODY);
        }

        if ($list->getNextPageToken() != null) {
            $pageToken = $list->getNextPageToken();
            $list = $gmail->users_messages->listUsersMessages('me', array('pageToken' => $pageToken));
        } else {
            break;
        }
    }
} catch (Exception $e) {
    echo $e->getMessage();
}

}

$client = new Google_Client();
$client->setAuthConfig('client_secrets.json');
$client->addScope(Google_Service_Gmail::GMAIL_READONLY);

if (isset($_SESSION['access_token']) && $_SESSION['access_token']) {
    $client->setAccessToken($_SESSION['access_token']);
    $gmail = new Google_Service_Gmail($client);
    $q = ' after:2016/11/7';
    fetchMails($gmail, $q);
} else {
    $redirect_uri = 'http://' . $_SERVER['HTTP_HOST'] . '/gmail-api/oauth2callback.php';
    header('Location: ' . filter_var($redirect_uri, FILTER_SANITIZE_URL));
}

答案 4 :(得分:1)

一个简单,强大的解决方案

我对其他答案不满意,因为它们都存在缺陷(扰流板中的细化),并且有些答案冗长且混杂了问询者(和我未曾寻找)的功能。

  

在其他答案中警告您潜在的问题:
 没有纯文本后备
 或无法处理falsy消息正文-字符串'0'(不太可能发生,但不太可能)
 或缺乏对有效载荷树结构的足够深入的搜索

所以我想我可以省去别人的麻烦,并共享我的代码(在我的整个收件箱中进行了测试)。

<script src="https://cdnjs.cloudflare.com/ajax/libs/react/16.6.3/umd/react.production.min.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/react-dom/16.6.3/umd/react-dom.production.min.js"></script>
<div id="react"></div>

答案 5 :(得分:0)

作为进一步的改进,代码应该是递归的,还需要以“ full”格式加载消息以提取正文。您可以在以下三个函数中放入自己的类。

driver.get(r'https://markets......')
driver.find_element_by_css_selector('h2 span[data-mod-action="toggle-filter"]').click()
# 4 input, index 1 and 3 are hidden
inputDate = driver.find_elements_by_css_selector('.mod-ui-date-picker input')
inputDate[0].click()
driver.find_element_by_xpath('//*[@title="Next month"]').click()
driver.find_element_by_xpath("//*[@aria-label='1 Jan, %d']" %(y)).click() 
# wait until the element removed
wait.until(
    lambda d: len(d.find_elements_by_css_selector('.mod-ui-loading__overlay')) == 0
)
inputDate[2].click()
# select second date piker "[2]"
driver.find_element_by_xpath("(//*[@aria-label='5 Jan, %d'])[2]" %(y)).click()