我想计算不同编码中几个国内字符的整数表示代码点(我确信所有这些编解码器都包含这些字符。)。我的程序看起来像这样:
characters = ['Č', 'č', 'Š', 'š', 'Ž', 'ž']
codecs = ['iso8859_2', 'cp1250', 'mac_latin2', 'utf-8', 'utf_16_le', 'utf_16_be']
for letter in characters:
for code in codecs:
print(letter + ' ' + code + ' ' + str(ord(letter.encode(code))))
输出:
Č iso8859_2 200
Č cp1250 200
Traceback (most recent call last):
File "C:/Users/Miha/Documents/2Semester/IK/Vaja2/chrEncode.py", line 7, in <module>
print(letter + ' ' + code + ' ' + str(ord(letter.encode(code))))
TypeError: ord() expected a character, but string of length 2 found
Č mac_latin2 137
答案 0 :(得分:0)
下一个评论的代码段可以提供帮助:
characters = ['Č'] #, 'č', 'Š', 'š', 'Ž', 'ž']
codecs = ['iso8859_2', 'cp1250', 'mac_latin2', 'utf-8', 'utf_16_le', 'utf_16_be']
for letter in characters:
for code in codecs:
charenc = letter.encode(code)
if len(charenc) == 1:
charcod = str(ord(letter.encode(code)))
else:
charcod = '0x' + ''.join('{:02X}'.format(charenc[i]) \
for i in range(0,len(charenc)))
print( letter +
' U+' + '{:04X}'.format(ord(letter)) + # Unicode codepoint (UCS-2)
' (=' + str(ord(letter)) + # detto in decimal
'), length=' + str(len(charenc)) + # string length
' ' + charcod + # value
' in ' + code + # encoding
'')
<强>输出强>:
D:\test\Python> python 37191263.py
Č U+010C (=268), length=1 200 in iso8859_2
Č U+010C (=268), length=1 200 in cp1250
Č U+010C (=268), length=1 137 in mac_latin2
Č U+010C (=268), length=2 0xC48C in utf-8
Č U+010C (=268), length=2 0x0C01 in utf_16_le
Č U+010C (=268), length=2 0x010C in utf_16_be
此处所有utf-8
,utf_16_le
和utf_16_be
转换的值都以十六进制打印,但将它们转换为十进制不会成为有问题的任务,尽管小数似乎没有用恕我直言。相反,我也会在其他情况下将所有转换为十六进制。
很抱歉,如果我对您的脚本的调整看起来很小。这是我的第一次Python会议,因为我安装并试用它直到你的问题...感谢灵感来一个新的特殊体验!
答案 1 :(得分:0)
我找到了代替'pnl1 Data'
的classmethod g_int_c = str(df[1].loc[6, 'B'])
。
代码:
import pandas as pd
try: from cStringIO import StringIO # for Python2
except ImportError: from io import StringIO # for Python3
import textwrap
df1 = pd.read_csv(StringIO(textwrap.dedent("""
,,,
0,1,2,3
1,4,5,6
7,8,9,10""")))
df2 = pd.read_csv(StringIO(textwrap.dedent("""
,,,
0,NULL,2,3
1,4,NULL,NULL""")), converters={i:str for i in range(4)})
sheets = ['pnl1 Data','pnl2 Data']
writer = pd.ExcelWriter('/tmp/output.xlsx')
for df, sheet in zip([df1, df2], sheets):
print(df)
# Unnamed: 0 Unnamed: 1 Unnamed: 2 Unnamed: 3
# 0 0 NULL 2 3
# 1 1 4 NULL NULL
df.to_excel(writer, sheet)
writer.save()
df = pd.read_excel('/tmp/output.xlsx', sheetname=sheets, names=list('ABCD'), parse_cols="A:E")
df = {i: df[sheet] for i, sheet in enumerate(sheets, 1)}
for key, dfi in df.items():
print(dfi)
# A B C D
# 0 0 1 2 3
# 1 1 4 5 6
# 2 7 8 9 10
# A B C D
# 0 0 NaN 2.0 3.0
# 1 1 4.0 NaN NaN
print(df[1].loc[1, 'B'])
# 4
仅输出'Č':
int.from_bytes(bytes, byteorder, *, signed=False)