使用Python3转换非ascii字符

时间:2017-05-26 16:44:00

标签: python-3.x non-ascii-characters

我想问一个关于在Python中将文本字符转换为二进制数的问题。 我写了一个程序,将所有ASCII字符和一些土耳其字符转换为二进制数字。 下面的代码是转换器程序的代码:

while True:
    ASCII_characters_dict = {chr(i): "0" + bin(ord(chr(i)))[2:] for i in range(128)}
    for i in ASCII_characters_dict:
        if len(ASCII_characters_dict[i]) == 7:
            ASCII_characters_dict[i] = "0" + ASCII_characters_dict[i]
        elif len(ASCII_characters_dict[i]) == 6:
            ASCII_characters_dict[i] = "00" + ASCII_characters_dict[i]
        elif len(ASCII_characters_dict[i]) == 5:
            ASCII_characters_dict[i] = "000" + ASCII_characters_dict[i]
        elif len(ASCII_characters_dict[i]) == 4:
            ASCII_characters_dict[i] = "0000" + ASCII_characters_dict[i]
        elif len(ASCII_characters_dict[i]) == 3:
            ASCII_characters_dict[i] = "00000" + ASCII_characters_dict[i]
        elif len(ASCII_characters_dict[i]) == 2:
            ASCII_characters_dict[i] = "000000" + ASCII_characters_dict[i]
    Turkish_characters = "çÇöÖüÜ"
    Turkish_characters_dict = {i: bin(ord(i))[2:] for i in Turkish_characters}
    Dictionary = ASCII_characters_dict.copy()
    Dictionary.update(Turkish_characters_dict)
    başlık = "WELCOME TO THE CONVERTOR"
    süs="-"*80
    print("\n{}\n\n{}\n".format(süs, başlık.center(80," ")))
    seçenekler = "1. To convert text to binary, press '1'.\n2. To convert binary to text, press '2'.\n3. To exit the program, press '3'."
    print("{0}\n\n{1}\n\n{0}\n".format(süs, seçenekler))
    while True:
        seçim = input("Select:")
        print("\n{}\n".format(süs))
        if seçim=="1":
            break
        elif seçim=="2":
            break
        elif seçim == "3":
            quit()
        else:
            print("Warning: Please select one of the given numbers.\n")
    while seçim == "1":
        altbaşlık = "Convert Text to Binary"
        print("{}\n\n{}\n".format(altbaşlık.center(80," "), süs))
        text_1 = input("Text:")
        text_2 = ""
        for i in text_1:
            for j in Dictionary:
                if i == j:
                    text_2 += Dictionary[j]
        with open("Text_To_Binary.txt","a") as dosya:
            dosya.write("\n{0}\n\nBinary: {1}\n\n{0}\n".format(süs, text_2))
        print("\n{0}\n\nBinary: {1}\n\n{0}".format(süs, text_2))
        message = "1. To continue converting text to binary, press '1'.\n\n3. To return the main page, press '2'.\n\n3. To exit the program, press '3'."
        print("\n{}\n\n{}".format(message,süs))
        while True:
            yeni_seçim_1 = input("\nSelect:")
            print("\n{}\n".format(süs))
            if yeni_seçim_1 == "1":
                break
            elif yeni_seçim_1 == "2":
                break
            elif yeni_seçim_1 == "3":
                quit()
            else:
                print("Warning: Please select one of the given numbers.")
        if yeni_seçim_1 == "1":
            continue
        elif yeni_seçim_1 == "2":
            break
    while seçim == "2":
        altbaşlık = "Convert Binary to Text"
        print("{}\n\n{}\n".format(altbaşlık.center(80," "), süs))
        text_1 = input("Binary:")
        text_2 = ""
        list_1 = []
        if " " in text_1:
            list_1 = text_1.split()
        elif " " not in text_1:
            for i in range(len(text_1)):
                if i % 8 == 0:
                    text_2 += " "
                text_2 += text_1[i]
            list_1 = text_2[1:].split(" ")
        text_3 = ""
        for i in list_1:
            for j in Dictionary:
                if i == Dictionary[j]:
                    text_3 += j
        with open("Binary_To_Text.txt","a") as dosya:
            dosya.write("\n{0}\n\nText: {1}\n\n{0}\n".format(süs, text_3))
        print("\n{0}\n\nText: {1}\n\n{0}\n".format(süs, text_3))
        message = "1. To continue converting binary to text, press '1'.\n\n3. To return the main page, press '2'.\n\n3. To exit the program, press '3'."
        print("{}\n\n{}".format(message,süs))
        while True:
            yeni_seçim = input("\nSelect:")
            print("\n{}\n".format(süs))
            if yeni_seçim == "1":
                break
            elif yeni_seçim == "2":
                break
            elif yeni_seçim == "3":
                quit()
            else:
                print("Warning: Please select one of the given numbers.")
        if yeni_seçim == "2":
            break

此转换器可以正确地将“çÇöÖüÜ”字符转换为二进制数字。我从Turkish_characters_list中删除了“şŞğĞıİ”字符,因为程序无法正确转换它们。

根据http://roubaixinteractive.com/PlayGround/Binary_Conversion/Binary_To_Text.asp

  1. 字符“ş”的二进制数字是“001001100010001100110011001101010011000100111011” 当我复制这个数字并将其写入程序的“二进制到文本”部分时,输出显示为“& #351;“< - > (我在字符之间放置了空格,因为它显示为“ş”)。 当我输入chr(351)时,输出显示结果为字符“ş”

  2. 字符“ş”的二进制数是bin(351),等于“0b101011111”。但是当我将这些数字写入转换器时,程序将结果显示为null。

  3. “ŞğĞıİ”字符中可以看到同样的问题。但是“çÇöÖüÜ”字符可以毫无问题地转换。

  4. 根据https://www.binarytranslator.com/

    1. 字符“ş”的二进制数是“01011111”。但这些数字属于字符“_”

    2. “ŞğĞıİ”字符中可以看到同样的问题。

    3. 所以,我的一个问题是关于为什么“çÇöÖüÜ”字符可以正确转换而“şŞğĞıİ”字符无法正确转换?除了在输入步骤之后通过控制“şŞğĞıİ”字符来解释这个问题,还有什么解决方案吗? 提前谢谢。

1 个答案:

答案 0 :(得分:-1)

让我分享一下我从这个例子中学到的东西。

首先,ASCII字符是8位数的字符。它们中的7个表示字符的二进制数,最后一个数字等于“0”,称为“奇偶校验位”。 (见https://en.wikipedia.org/wiki/Parity_bit

例如:字符 a 的二进制数'1100001'但通常此数字显示为“01100001”。这种显示称为“奇偶校验”。奇偶校验是一种控制类型,用于了解二进制数是奇数还是偶数。

向此二进制数添加奇偶校验位的原因是在向另一台计算机发送此二进制数时,传输可能会中断。二进制数“a”等于 97 ,这是一个奇数。通过将“0”奇偶校验位置于此数字,此数字将继续表示 97

所以字符“a”是用ASCII定义的数字。所有ASCII字符均为8位(表示二进制数字为8位)。但非ASCII字符为16位。让我们看看为什么非ASCII字符是16位。

number=ord("a")
#number=97
string=chr(number)
#string="a"

以上代码定义的字符串仅包含字符“a”。但是,当用户想要使用上面的“utf-8”对此数字进行编码时:

number=ord("a")
#number=97
string=chr(number).encode(encoding="utf-8")
#string=b'a'
len(string)
#1

如果在“b”字符后出现“\ x”字符,此代码将以十六进制格式定义字符串。假设字符是“ç”

number=ord("ç")
#number=231
string=chr(231).encode(encoding="utf-8")
#string="b'\xc3\xa7'"
len(string)
#2

字符串的最后一个值似乎有点奇怪,但它的真正价值在于它。值为“c3a7”,这是十六进制数。字符“a”的长度为 1 。这意味着该数字有8位(位),等于1个字节。但是,字符“ç”的长度为2.这意味着该数字有16位(位),等于2字节。

让我们看看字符ç的二进制数:

number=int("c3a7",16)
#number=50087
binary_number_of_character_ç=bin(50087)
#binary_number_of_character_ç=1100001110100111
len(binary_number_of_character_ç)
#16

所以“ç”的二进制数是“1100001110100111”,它通常也会显示为“11000011 10100111”

如果我们根据上面的信息更改整个代码结构,代码可以如下所示,在显示非ascii字符时没有错误:

#-----------------------------IMPORTING Fore and init FUNCTIONS FROM COLORAMA MODULE------------------------------------

from colorama import Fore,init
init(autoreset=True)

#--------------------------------------------DICTIONARY FUNCTION--------------------------------------------------------

def dictionary():
    ascii_dictionary = {chr(i): bin(i)[2:] for i in range(128)}
    for i in ascii_dictionary:
        if len(ascii_dictionary[i]) < 8:
            count = 8 - len(ascii_dictionary[i])
            ascii_dictionary[i] = "".zfill(count) + ascii_dictionary[i]
    non_ascii_dictionary = {chr(i): bin(int(bytes(chr(i).encode(encoding="utf-8")).hex(), 16))[2:10] + " " +
                        bin(int(bytes(chr(i).encode(encoding="utf-8")).hex(), 16))[10:18] for i in range(128, 512)}
    dictionary = ascii_dictionary.copy()
    dictionary.update(non_ascii_dictionary)
    return dictionary

#--------------------------------------------CONVERTER FUNCTIONS--------------------------------------------------------

def convert_text_to_binary():
    text_1 = input("Write A Text:")
    print(Fore.LIGHTBLUE_EX + "\n{}".format("-" * 80))
    return_value = dictionary()
    list_1 = [return_value[j] for i in text_1 for j in return_value if i == j]
    text_2 = " ".join(list_1)
    print(Fore.RED + "\nOutput: " + Fore.GREEN + "{}\n\n".format(text_2) + Fore.LIGHTBLUE_EX + "{}\n".format("-" * 80))
    with open("Text.to_Binary.txt", "a", encoding="utf-8") as file:
        file.write("{0}\nInput: {2}\n\nOutput: {1}\n{0}".format("\n"+"-"*80+"\n", text_2, text_1))


def convert_binary_to_text():
    text_1 = input("Write Binary Numbers:")
    print(Fore.LIGHTBLUE_EX + "\n{}".format("-" * 80))
    list_1 = text_1.split()
    list_2 = [i for i in list_1 if i.startswith("1")]
    count = 0
    list_3 = []
    while count < len(list_2):
        list_3.append(" ".join(list_2[count:count + 2]))
        count += 2
    list_4 = []
    count = 0
    for i in list_1:
        if i.startswith("0"):
            list_4.append(i)
        elif i.startswith("1"):
            list_1.remove(i)
            list_4.append(list_3[count])
            count += 1
    text_2 = ""
    return_value = dictionary()
    for i in list_4:
        for j in return_value:
            if i == return_value[j]:
                text_2 += j
    print(Fore.RED + "\nOutput: " + Fore.GREEN + "{}\n\n".format(text_2) + Fore.LIGHTBLUE_EX + "{}\n".format("-" * 80))
    with open("Binary_to_Text.txt", "a", encoding="utf-8") as file:
        file.write("{0}\nInput: {2}\n\nOutput: {1}\n{0}".format("\n"+"-"*80+"\n", text_2, text_1))

#------------------------------------------STYLING WITH TEXT CLASS------------------------------------------------------

class text():
    def __init__(self,name,style=Fore.LIGHTBLUE_EX+"-"*80):
        self.name=name
        self.style=style
    def title(self):
        print("\n{0}\n\n{1}\n\n{0}\n".format(self.style, str(self.name).center(80, " ")))
    def paragraph(self):
        print("{0}\n\n{1}\n".format(self.name, self.style))

#-----------------------------------------------TEXT INSTANCES----------------------------------------------------------

head = text(Fore.RED + "WELCOME TO THE CONVERTER")
sub_head_1 = text(Fore.RED + "CONVERT TEXT TO BINARY")
sub_head_2 = text(Fore.RED + "CONVERT BINARY TO TEXT")
head_options = text(Fore.RED + "1. " + Fore.GREEN + "To convert text to binary, press '1'.\n\n" + 
                    Fore.RED + "2. " + Fore.GREEN + "To convert binary to text, press '2'.\n\n" + 
                    Fore.RED + "3. " + Fore.GREEN + "To exit the program, press '3'.")
sub_head_options = text(Fore.RED + "1. " + Fore.GREEN + "To continue converting, press '1'\n\n" + 
                        Fore.RED + "2. " + Fore.GREEN + "To return the main page, press '2'.\n\n" + 
                        Fore.RED + "3. " + Fore.GREEN + "To exit the program, press '3'.")

#-----------------------------------------------CHOICE FUNCTION---------------------------------------------------------

def choice():
    while True:
        select = input("Select:")
        if select == "1":
            return select
        elif select == "2":
            return select
        elif select == "3":
            quit()
        else:
            print(Fore.RED+"\nWarning: "+Fore.GREEN+"Please select one of the given numbers.\n")
            continue
        if select == "1" or select == "2":
            break

#---------------------------------------BUNDLING PROGRAM PARTS IN FUNCTION----------------------------------------------

def main_program():
    while True:
        head.title()
        head_options.paragraph()
        select = choice()
        while select == "1":
            sub_head_1.title()
            convert_text_to_binary()
            sub_head_options.paragraph()
            new_select = choice()
            if new_select == "2":
                break
        while select == "2":
            sub_head_2.title()
            convert_binary_to_text()
            sub_head_options.paragraph()
            new_select = choice()
            if new_select == "2":
                break

main_program()