Question

在我的一个python程序（ python 2.7 ）中，我需要处理一些中文字符：

我有一个文件A.txt，它有两列：“name”和“score”，“name”列可以是一些中文字符串，而得分则是int数值介于1和10之间.Attxt以 GBK 编码，这是一个中文字符编码。
我将A.txt的每一行插入到我的mysql表tb_name_score中，它有三列：ID，NAME，SCORE，其NAME列的编码为 latin1_swedish_ci
< / LI>
现在，我有另一个文件名B.txt，它也有两列，“name”和“score”，我需要根据B.txt更新tb_name_score的SCORE列。 B.txt也以 GBK
所以，我遍历B.txt，读取一行并使用它的“name”值与tb_name_score.NAME中的记录进行比较，如果它们相等，那么我更新tb_name_score.SCORE。但是，虽然B.txt中行的“name”列是与tb_name_score.NAME中的值相同的中文字符串，但“=”返回false，我只是无法更新表。有人可以帮忙吗？谢谢！

Answer 1

希望它有所帮助：

Python 2.7.3 (default, Apr 10 2013, 06:20:15) 
[GCC 4.6.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> a=u'后者'
>>> b='后者'
>>> type(a)
<type 'unicode'>
>>> type(b)
<type 'str'>
>>> a==b
__main__:1: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
False
>>> b
'\xe5\x90\x8e\xe8\x80\x85'
>>> a
u'\u540e\u8005'
>>> b.decode('utf8')
u'\u540e\u8005'
>>> a.encode('utf8')
'\xe5\x90\x8e\xe8\x80\x85'
>>>

Answer 2

df_raw=pd.read_excel('/Users/zh/workspace/CityRealEstate/CityDataset20180521-4.xlsx')

df_train = df_raw.iloc[:,3:59]
print df_raw.loc[df_raw['Year'] <> 2016]

city = '深圳'
print df_raw['City'].values
df_train=df_raw.loc[df_raw['City'] == city.decode('utf8')]

它对我有用

如何比较python程序中的非英语（中文）字符？

2 个答案: