如何找到代表特殊字符代码点的整数? TypeError:ord()需要一个字符,但找到长度为2的字符串

时间:2016-05-12 15:23:08

标签: python-3.x encoding character-encoding ord

我想计算不同编码中几个国内字符的整数表示代码点(我确信所有这些编解码器都包含这些字符。)。我的程序看起来像这样:

characters = ['Č', 'č', 'Š', 'š', 'Ž', 'ž']
codecs = ['iso8859_2', 'cp1250', 'mac_latin2', 'utf-8', 'utf_16_le', 'utf_16_be']

for letter in characters:
    for code in codecs:
        print(letter + ' ' + code + ' ' + str(ord(letter.encode(code))))

输出:

Č iso8859_2 200
Č cp1250 200
Traceback (most recent call last):
  File "C:/Users/Miha/Documents/2Semester/IK/Vaja2/chrEncode.py", line 7, in <module>
    print(letter + ' ' + code + ' ' + str(ord(letter.encode(code))))
TypeError: ord() expected a character, but string of length 2 found
Č mac_latin2 137

2 个答案:

答案 0 :(得分:0)

下一个评论的代码段可以提供帮助:

characters = ['Č'] #, 'č', 'Š', 'š', 'Ž', 'ž']
codecs = ['iso8859_2', 'cp1250', 'mac_latin2', 'utf-8', 'utf_16_le', 'utf_16_be']

for letter in characters:
    for code in codecs:
        charenc = letter.encode(code)
        if len(charenc) == 1:
            charcod = str(ord(letter.encode(code)))
        else:
            charcod = '0x'   + ''.join('{:02X}'.format(charenc[i]) \
                                    for i in range(0,len(charenc)))
        print(  letter       + 
                ' U+'        + '{:04X}'.format(ord(letter)) + # Unicode codepoint (UCS-2)
                ' (='        + str(ord(letter))             + # detto in decimal
                '), length=' + str(len(charenc))            + # string length
                ' '          + charcod                      + # value
                ' in '       + code                         + # encoding 
                '')

<强>输出

D:\test\Python> python 37191263.py
Č U+010C (=268), length=1 200 in iso8859_2
Č U+010C (=268), length=1 200 in cp1250
Č U+010C (=268), length=1 137 in mac_latin2
Č U+010C (=268), length=2 0xC48C in utf-8
Č U+010C (=268), length=2 0x0C01 in utf_16_le
Č U+010C (=268), length=2 0x010C in utf_16_be

此处所有utf-8utf_16_leutf_16_be转换的值都以十六进制打印,但将它们转换为十进制不会成为有问题的任务,尽管小数似乎没有用恕我直言。相反,我也会在其他情况下将所有转换为十六进制。

很抱歉,如果我对您的脚本的调整看起来很小。这是我的第一次Python会议,因为我安装并试用它直到你的问题...感谢灵感来一个新的特殊体验!

答案 1 :(得分:0)

我找到了代替'pnl1 Data'的classmethod g_int_c = str(df[1].loc[6, 'B']) 。 代码:

import pandas as pd
try: from cStringIO import StringIO         # for Python2
except ImportError: from io import StringIO # for Python3
import textwrap
df1 = pd.read_csv(StringIO(textwrap.dedent("""
          ,,,
          0,1,2,3
          1,4,5,6
          7,8,9,10""")))
df2 = pd.read_csv(StringIO(textwrap.dedent("""
          ,,,
          0,NULL,2,3
          1,4,NULL,NULL""")), converters={i:str for i in range(4)})

sheets = ['pnl1 Data','pnl2 Data']

writer = pd.ExcelWriter('/tmp/output.xlsx')
for df, sheet in zip([df1, df2], sheets):
    print(df)
    #   Unnamed: 0 Unnamed: 1 Unnamed: 2 Unnamed: 3
    # 0          0       NULL          2          3
    # 1          1          4       NULL       NULL
    df.to_excel(writer, sheet)
writer.save()

df = pd.read_excel('/tmp/output.xlsx', sheetname=sheets, names=list('ABCD'), parse_cols="A:E")
df = {i: df[sheet] for i, sheet in enumerate(sheets, 1)}

for key, dfi in df.items():
    print(dfi)
    #    A  B  C   D
    # 0  0  1  2   3
    # 1  1  4  5   6
    # 2  7  8  9  10
    #    A    B    C    D
    # 0  0  NaN  2.0  3.0
    # 1  1  4.0  NaN  NaN

print(df[1].loc[1, 'B'])
# 4

仅输出'Č':

int.from_bytes(bytes, byteorder, *, signed=False)