将日语字符解码为Base64

时间:2019-04-25 03:05:02

标签: python python-3.x utf-8 base64 decode

我在解码方面有一些复杂的问题... 我有一个代码,用于从gmail(由siri提取)中提取注释,并将其插入变量中,并比较单词的len,以了解单词是否在keywords列表中-另一个.py文件

问题是gmail将日语字符更改为6luk,并且不匹配...即使我将keywords .py文件字词更改为6luk不起作用。...仅当我将6luk直接写到代码中时,它才起作用。 当我使用

时,6luk可以更改为
    base64.b64decode(command).decode('utf-8')

但是因为它已经在

中进行解码
   voice_command = email.message_from_string(data[0][1].decode('utf-8'))

它不能很好地工作。...我可以从那里删除.decode('utf-8'),但是它根本不能工作... 我尝试将包含command的变量6luk从gmail解码为base64,该变量可以在线工作(解码站点),即使在另一个文件中,

   base64.b64decode(command).decode('utf-8')

,但它不能在command变量中使用。  它说了

   The word(s) '6luk' have been said
   Received an exception while running: 'utf-8' codec can't decode byte 0xea in position 0: invalid continuation byte

我一直在搜索看起来像Latin-1的0xea,但是当我将其转换为Latin-1时,它变得更加复杂:ê[¤

这是代码,它是

的一部分
    hackster.io/thesanjeetc/siricontrol-add-siri-voice-control-to-any-project-644b52

项目

顺便说一句。 gmail中的原始注释看起来像这样


Content-Type: text/html; charset=utf-8
Content-Transfer-Encoding: base64
From:  <@gmail.com>
X-Uniform-Type-Identifier: com.apple.mail-note
Mime-Version: 1.0 (iOS/12.2 \(-----\) dataaccessd/1.0)
Date: Thu, 25 Apr 2019 11:42:33 +0900
X-Mail-Created-Date: Thu, 25 Apr 2019 11:42:33 +0900
Subject: 車
X-Universally-Unique-Identifier: --------
Message-Id: <-------@gmail.com>

6LuK


import time
import imaplib
import email
import os
import pkgutil
import base64

##########################################

# Add your gmail username and password here

username = ""
password = ""

##########################################


class ControlException(Exception):
    pass


class Control():
    def __init__(self, username, password):
        print("------------------------------------------------------")
        print("-                    SIRI CONTROL                    -")
        print("-           Created by Sanjeet Chatterjee            -")
        print("-      Website: https://medium.com/@thesanjeetc      -")
        print("------------------------------------------------------")

        try:
            self.last_checked = -1
            self.mail = imaplib.IMAP4_SSL("imap.gmail.com", 993)
            self.mail.login(username, password)
            self.mail.list()
            self.mail.select("Notes")

            # Gets last Note id to stop last command from executing
            result, uidlist = self.mail.search(None, "ALL")
            try:
                self.last_checked = uidlist[0].split()[-1]
            except IndexError:
                pass

            self.load()
            self.handle()
        except imaplib.IMAP4.error:
            print("Your username and password is incorrect")
            print("Or IMAP is not enabled.")

    def load(self):
        """Try to load all modules found in the modules folder"""
        print("\n")
        print("Loading modules...")
        self.modules = []
        path = os.path.join(os.path.dirname(__file__), "modules")
        directory = pkgutil.iter_modules(path=[path])
        for finder, name, ispkg in directory:
            try:
                loader = finder.find_module(name)
                module = loader.load_module(name)
                if hasattr(module, "commandWords") \
                        and hasattr(module, "moduleName") \
                        and hasattr(module, "execute"):
                    self.modules.append(module)
                    print("The module '{0}' has been loaded, "
                          "successfully.".format(name))
                else:
                    print("[ERROR] The module '{0}' is not in the "
                          "correct format.".format(name))
            except:
                print("[ERROR] The module '" + name + "' has some errors.")
        print("\n")

    def fetch_command(self):
        """Retrieve the last Note created if new id found"""
        self.mail.list()
        self.mail.select("Notes")

        result, uidlist = self.mail.search(None, "ALL")
        try:
            latest_email_id = uidlist[0].split()[-1]
        except IndexError:
            return

        if latest_email_id == self.last_checked:
            return

        self.last_checked = latest_email_id
        result, data = self.mail.fetch(latest_email_id, "(RFC822)")
        voice_command = email.message_from_string(data[0][1].decode('utf-8'))
        return str(voice_command.get_payload()).lower().strip()

    def handle(self):
        """Handle new commands

        Poll continuously every second and check for new commands.
        """
        print("Fetching commands...")
        print("\n")

        while True:
            try:
                command = self.fetch_command()
                if not command:
                    raise ControlException("No command found.")

                print("The word(s) '" + command + "' have been said")
                command = base64.b64decode(command)
                command = (command.decode('Latin-1'))
                command = base64.b64encode(command).encode('utf-8')
                command = base64.b64encode(command).decode('utf-8')
                print(command)
                for module in self.modules:
                    foundWords = []
                    for word in module.commandWords:
                        if str(word) in command:
                            foundWords.append(str(word))
                    if len(foundWords) == len(module.commandWords):
                        try:
                            module.execute(command)
                            print("The module {0} has been executed "
                                  "successfully.".format(module.moduleName))
                        except:
                            print("[ERROR] There has been an error "
                                  "when running the {0} module".format(
                                      module.moduleName))
                    else:
                        print("\n")
            except (TypeError, ControlException):
                pass
            except Exception as exc:
                print("Received an exception while running: {exc}".format(
                    **locals()))
                print("Restarting...")
            time.sleep(1)


if __name__ == '__main__':
    Control(username, password)


1 个答案:

答案 0 :(得分:2)

您用imaplib检索的正文是bytes对象。无需decode即可将其传递给b64decode;

>>> base64.b64decode(b'6Luk')
b'\xe8\xbb\xa4'

这是字符U+8ECA的UTF-8编码,因此下一步是对其进行解码。

>>> base64.b64decode(b'6Luk').decode('utf-8')
'軤'

如何准确地修复代码是一个好问题。我将更改fetch_command来从有效负载中返回实际的解码字符串,因为该函数中已经有许多关于期望输入的假设。

在无法访问您的基础架构的情况下,我真的没有一种很好的方法来进行测试,但是袖手旁观,也许是类似的

    def fetch_command(self):
        """Retrieve the body of the last Note created if new id found"""
        self.mail.list()
        self.mail.select("Notes")

        result, uidlist = self.mail.search(None, "ALL")
        try:
            latest_email_id = uidlist[0].split()[-1]
        except IndexError:
            return

        if latest_email_id == self.last_checked:
            return

        self.last_checked = latest_email_id
        result, data = self.mail.fetch(latest_email_id, "(RFC822)")
        # use message_from_bytes instead of attempting to decode something which almost certainly isn't UTF-8
        note = email.message_from_bytes(data[0][1])
        # extract body part
        voice_command = note.get_payload(decode=True)
        return voice_command.lower().strip()

    def handle(self):
        """Handle new commands

        Poll continuously every second and check for new commands.
        """
        print("Fetching commands...")
        #print("\n")   # empty output lines are an annoyance up with which I will not put

        while True:
            try:
                command = self.fetch_command()
                if not command:
                    raise ControlException("No command found.")

                print("The word(s) '" + command + "' have been said")
                #print(command)
                # etc etc

如果您的Python足够新(实际上是3.3+,但从3.6开始正确讲,这是新API成为默认值),则您可能想研究将email库的新功能与{{ 1}},而不是旧版界面。

email.policy

您会注意到,我们让 from email.policy import default # .... note = email.message_from_bytes(data[0][1], policy=default) voice_command = note.get_body() 库确定了解码内容和解码方式。我们避免对emailutf-8之类的东西进行硬编码,因为不同的文本可能会带有不同的字符集和/或不同的传输编码。您必须检查并遵守每个邮件部分的MIME标头。 (我们 对预期只有一个有效载荷进行硬编码。我也不完全确定这是一个可靠的假设。)

顺便说一句,这种邮件格式并不是GMail的特别功能,而是MIME封装内容以使其与基本7位纯ASCII的RFC822电子邮件消息格式兼容的方式。