我有一个由
的utf-8文件生成的查找列表@foreach ($packages as $package)
...
@foreach ($package->courses as $course)
...
@endforeach
@endforeach
当我打开文件时,我看到“الو”这个词就在那里。所以它在列表中,但列表现在看起来像 ['\ xd8 \ xa7 \ xd9 \ x84 \ xd9 \ x88','\ xd8 \ xa3 \ xd9 \ x84 \ xd9 \ x88','\ xd8 \ xa7 \ xd9 \ x88 \ xd9 \ x83 \ xd9 \ x8a', '\ xd8 \ xa7 \ xd9 \ x84','\ xd8 \ xa7 \ xd9 \ x87','\ xd8 \ xa3 \ xd9 \ x87','\ xd9 \ x87 \ xd9 \ x84 \ xd9 \ x88','\ xd8 \ xa3 \ xd9 \ x88 \ xd9 \ x83 \ xd9 \ x8a','\ xd9 \ x88']
然后我想搜索newStopWords1d中是否有特定的单词 'الو'这个词是'\ xd8 \ xa7 \ xd9 \ x84 \ xd9 \ x88'
with open('stop_word_Tiba.txt') as f:
newStopWords= list(itertools.chain( line.split() for line in f)) #save the file as list of lines
newStopWords1d=list(itertools.chain(*newStopWords)) # convert 2d list to 1d list
找不到这个词,我试过了
word='الو'
for w in newStopWords1d:
if word == w.encode("utf-8"):
print 'found'
但又没有看到这个词。这似乎是编码的问题,但我无法解决它。能帮帮我吗。
答案 0 :(得分:0)
值得一提的是你使用Python 2.7。
word='الو'
for w in newStopWords1d:
if word == w.decode("utf-8"):
print 'found'
更好的解决方案是使用io
import io
with io.open('stop_word_Tiba.txt', encoding="utf-8") as f:
...
或codecs
模块
import codecs
with codecs.open('stop_word_Tiba.txt', encoding="utf-8") as f:
...
因为Python 2.7中的内置开放函数不支持指定编码。
答案 1 :(得分:0)
通过将打开的文件语句编辑为
解决了问题with codecs.open("stop_word_Tiba.txt", "r", "utf-8") as f:
newStopWords= list(itertools.chain( line.split() for line in f)) #save the file as list of lines
newStopWords1d=list(itertools.chain(*newStopWords))
for w in newStopWords1d:
if word.encode("utf-8") == w.encode("utf-8") :
return 'found'
谢谢你...