Question

目前我正在开发一个程序，我有一行我需要将一个字符i与一个unicode字符进行比较＆＃34;“＆＃34;。它看起来如下：

    i != "”"

我的整个代码如下：

#!/usr/bin/env python
# -*- coding: utf-8 -*- 


f = open('text.txt', "r")
g = open('write.txt', "w")


for word in f:
  for i in word:
    if all( [i != " ", i != "," ,i != "!", i != "?", i != ";",  
       i !=".", i != ":", i != "”", i != "”" ]):
      g.write(i.lower())
    else:
        g.write('\n

这个想法是正在解析文本，并且取出所有字符，如点，点，问号等。唯一的问题是unicode字符“不会从文本中删除。你能帮我个忙吗？谢谢！

您的信息我使用的是python 2.7.11 +

Answer 1

在表达式i != "”"中，i和"”"都不是Unicode字符串。如果您想比较Unicode字符，并且您知道test.txt已编码utf-8，请尝试以下操作：

for i in word.decode('utf-8'):
    if i != u"”":

与您的问题没有直接关系，使用in可能比all()更容易：

if i not in u" ,!?;.:”":

这是一个经过测试的示例程序：

#!/usr/bin/env python
# -*- coding: utf-8 -*- 


f = open('text.txt', "r")
g = open('write.txt', "w")


for word in f:
  for i in word.decode('utf-8'):
    if i not in u" ,!?;.:”":
      g.write(i.lower())
    else:
      g.write('\n')

输入text.txt：

hello.zippy”
goodbye

输出write.txt：

hello
zippy

goodbye

Answer 2

罗布的答案并不完整。我不得不把它放在文件的开头：

import sys
reload(sys)
sys.setdefaultencoding('utf-8')

现在一切都像魅力一样！：d

Unicode字符比较不起作用

2 个答案: